SRN: Stacked Regression Network for Real-time 3D Hand Pose Estimation

Pengfei Ren (Beijing University of Posts and Telecommunications), Haifeng Sun (Beijing University of Posts and Telecommunications), Jingyu Wang (Beijing University of Posts and Telecommunications), Qi Qi (Beijing University of Posts and Telecommunications), Weiting Huang (Beijing University of Posts and Telecommunications)

Abstract
Recently, most of state-of-the-art methods are based on 3D input data, because 3D data capture more spatial information than the depth image. However, these methods either require a complex network structure or time-consuming data preprocessing and post-processing. We present a simple and accurate method for 3D hand pose estimation from a 2D depth image. This is achieved by a differentiable re-parameterization module, which constructs 3D heatmaps and unit vector fields from joint coordinates directly. Taking the spatial-aware representations as intermediate features, we can easily stack multiple regression modules to capture spatial structures of depth data efficiently for accurate and robust estimation. Furthermore, we explore multiple good practices to improve the performance of the 2D CNN for 3D hand pose estimation. Experiments on four challenging hand pose datasets show that our proposed method outperforms all state-of-the-art methods.

DOI
10.5244/C.33.176
https://dx.doi.org/10.5244/C.33.176

Files
Paper (PDF)
Supplementary material (PDF)

BibTeX
@inproceedings{BMVC2019,
title={SRN: Stacked Regression Network for Real-time 3D Hand Pose Estimation},
author={Pengfei Ren and Haifeng Sun and Jingyu Wang and Qi Qi and Weiting Huang},
year={2019},
month={September},
pages={176.1--176.14},
articleno={176},
numpages={14},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.176},
url={https://dx.doi.org/10.5244/C.33.176}
}