Referring Expression Object Segmentation with Caption-Aware Consistency
Yi-Wen Chen (Academia Sinica), Yi-Hsuan Tsai (NEC Labs America), Tiantian Wang (University of California at Merced), Yen-Yu Lin (Academia Sinica), Ming-Hsuan Yang (University of California at Merced) AbstractReferring expressions are natural language descriptions that identify a particular object within a scene and are widely used in our daily conversations. In this work, we focus on segmenting the object in an image specified by a referring expression. To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains. We introduce the spatial-aware dynamic filters to transfer knowledge from the language domain to the visual one, and can effectively capture the spatial information of the specified object. To further make useful communication between the language and visual modules, we employ a caption generation network that takes features shared across both domains as input, and improves both representations via a consistency that enforces the generated sentence to be similar to the original query. We evaluate the proposed framework on three referring expression datasets and show that our method performs favorably against the state-of-the-art algorithms.
DOI
10.5244/C.33.30
https://dx.doi.org/10.5244/C.33.30
Files
BibTeX
@inproceedings{BMVC2019,
title={Referring Expression Object Segmentation with Caption-Aware Consistency},
author={Yi-Wen Chen and Yi-Hsuan Tsai and Tiantian Wang and Yen-Yu Lin and Ming-Hsuan Yang},
year={2019},
month={September},
pages={30.1--30.12},
articleno={30},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.30},
url={https://dx.doi.org/10.5244/C.33.30}
}
title={Referring Expression Object Segmentation with Caption-Aware Consistency},
author={Yi-Wen Chen and Yi-Hsuan Tsai and Tiantian Wang and Yen-Yu Lin and Ming-Hsuan Yang},
year={2019},
month={September},
pages={30.1--30.12},
articleno={30},
numpages={12},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.30},
url={https://dx.doi.org/10.5244/C.33.30}
}