Feature Pyramid Encoding Network for Real-time Semantic Segmentation

Mengyu Liu (University of Manchester), Hujun Yin (University of Manchester)

Abstract
Although current deep learning methods have achieved impressive results for semantic segmentation, they incur high computational costs and have a huge number of parameters. For real-time applications, inference speed and memory usage are two important factors. To address the challenge, we propose a lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. Specifically, we use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated convolutions in all stages of the encoder. A mutual embedding upsample module is introduced in the decoder to aggregate the high-level semantic features and low-level spatial details efficiently. The proposed network outperforms existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically, FPENet achieves 68.0\% mean IoU on the Cityscapes test set with only 0.4M parameters and 102 FPS speed on an NVIDIA TITAN V GPU.

DOI
10.5244/C.33.203
https://dx.doi.org/10.5244/C.33.203

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={Feature Pyramid Encoding Network for Real-time Semantic Segmentation},
author={Mengyu Liu and Hujun Yin},
year={2019},
month={September},
pages={203.1--203.13},
articleno={203},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.203},
url={https://dx.doi.org/10.5244/C.33.203}
}