Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition

Hao Huang (University of Rochester), Luowei Zhou (University of Michigan), Wei Zhang (University of Rochester), Jason Corso (University of Michigan), Chenliang Xu (University of Rochester)

Abstract
Video action recognition, as a critical problem in video understanding, has been gaining increasing attention. To identify actions induced by complex object-object interactions, we need to consider not only spatial relations among objects in a single frame but also temporal relations among different or the same objects across multiple frames. However, existing approaches modeling video representations and non-local features are either incapable of explicitly modeling relations at the object-object level or unable to handle streaming videos. In this paper, we propose a novel dynamic hidden graph module to model complex object-object interactions in videos, of which two instantiations are considered: a visual graph that captures appearance/motion changes among objects and a location graph that captures relative spatiotemporal position changes among objects. Besides, the proposed graph module allows us to process streaming videos, setting it apart from existing methods. Experimental results on two benchmark datasets, Something-Something and ActivityNet, show the competitive performance of our methods.

DOI
10.5244/C.33.101
https://dx.doi.org/10.5244/C.33.101

Files
Paper (PDF)
Supplementary material (PDF)

BibTeX
@inproceedings{BMVC2019,
title={Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition},
author={Hao Huang and Luowei Zhou and Wei Zhang and Jason Corso and Chenliang Xu},
year={2019},
month={September},
pages={101.1--101.12},
articleno={101},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.101},
url={https://dx.doi.org/10.5244/C.33.101}
}