For object detection tasks, multi-stage detection frameworks have achieved excellent detection performance (e.g., Cascade R-CNN) compared to those one and two-stage frameworks (e.g., FPN). In this work, we introduce an LSTM-based proposal refinement module that iteratively refines proposed bounding boxes. This module can naturally be integrated with different frameworks. And the number of iterative steps is flexible and can differ between training and testing stages. In this work, we focus on improving the widely used two-stage frameworks by replacing the original bounding box regression head with our proposed module. To verify the efficacy of our method, we perform extensive experiments on PASCAL VOC and MS COCO benchmarks with both ResNet-50 and ResNet-101 backbones. The results show that by having our LSTM based module it achieves significantly higher mAP than the vanilla R-FCN and FPN on both benchmarks. Meanwhile, it outperforms the existing state-of-the-art method Cascade R-CNN especially under high IoU thresholds.
Supplementary material (ZIP)