Body Part Alignment and Temporal Attention Pooling for Video-Based Person Re-Identification

Michael Jones (Mitsubishi Electric Research Laboratories), Sai Saketh Rambhatla (University of Maryland)

Abstract
We present a novel deep neural network for video-based person re-identification that is designed to address two of the major issues that make this problem difficult. The first is dealing with misalignment between cropped images of people. For this we take advantage of the OpenPose network to localize different body parts so that corresponding regions of feature maps can be compared. The second is dealing with bad frames in a video sequence. These are typically frames in which the person is occluded, poorly localized or badly blurred. For this we design a temporal attention network that analyzes feature maps of multiple frames to assign different weights to each frame. This allows more useful frames to receive more weight when creating an aggregated feature vector representing an entire sequence. Our resulting deep network improves over the state of the art on all three standard test sets for video-based person re-id (PRID2011, iLIDS-VID and MARS).

DOI
10.5244/C.33.115
https://dx.doi.org/10.5244/C.33.115

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={Body Part Alignment and Temporal Attention Pooling for Video-Based Person Re-Identification},
author={Michael Jones and Sai Saketh Rambhatla},
year={2019},
month={September},
pages={115.1--115.12},
articleno={115},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.115},
url={https://dx.doi.org/10.5244/C.33.115}
}