Spatio-Temporal Associative Representation for Video Person Re-Identification

Guile Wu (Queen Mary University of London), Xiatian Zhu (Samsung AI Centre, Cambridge), Shaogang Gong (Queen Mary University of London)

Abstract
Learning discriminative spatio-temporal representation is the key for solving video re-identification (re-id) challenges. Most existing methods focus on learning appearance features and/or selecting image frames, but ignore optimising the compatibility and interaction of appearance and motion attentive information. To address this limitation, we propose a novel model to learning Spatio-Temporal Associative Representation (STAR). We design local frame-level spatio-temporal association to learn discriminative attentive appearance and short-term motion features, and global video-level spatio-temporal association to form compact and discriminative holistic video representation. We further introduce a pyramid ranking regulariser for facilitating end-to-end model optimisation. Extensive experiments demonstrate the superiority of STAR against state-of-the-art methods on four video re-id benchmarks, including MARS, DukeMTMC-VideoReID, iLIDS-VID and PRID-2011.

DOI
10.5244/C.33.62
https://dx.doi.org/10.5244/C.33.62

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={Spatio-Temporal Associative Representation for Video Person Re-Identification},
author={Guile Wu and Xiatian Zhu and Shaogang Gong},
year={2019},
month={September},
pages={62.1--62.13},
articleno={62},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.62},
url={https://dx.doi.org/10.5244/C.33.62}
}