Relation-aware Multiple Attention Siamese Networks for Robust Visual Tracking

Fangyi Zhang (Chinese Academy of Sciences), Bingpeng Ma (Chinese Academy of Sciences), Hong Chang (Chinese Academy of Sciences), Shiguang Shan (Chinese Academy of Sciences), Xilin Chen (Institute of Computing Technology, Chinese Academy of Sciences)

Partial occlusion is a challenging problem in visual object tracking. Neither Siamese network based trackers nor conventional part-based trackers can address this problem successfully. In this paper, inspired by the fact that attentions can make the model focus on the most salient regions of an image, we propose a new method named Relation-aware Multiple Attention (RMA) to address the partial occlusion problem. In the RMA module, part features generated from a set of attention maps can represent the discriminative parts of the target and ignore the occluded ones. Meanwhile, an attention regularization term is proposed to force the multiple attention maps to localize diverse local patterns. Besides, we incorporate relation-aware compensation to adaptively aggregate and distribute part features to capture the semantic dependency among them. We integrate the RMA module into Siamese matching networks and verify the superior performance of the RMA-Siam tracker on five visual tracking benchmarks, including VOT-2016, VOT-2017, LaSOT, OTB-2015 and TrackingNet.


Paper (PDF)

title={Relation-aware Multiple Attention Siamese Networks for Robust Visual Tracking},
author={Fangyi Zhang and Bingpeng Ma and Hong Chang and Shiguang Shan and Xilin Chen},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},