Normalized Mean Error (NME) is one of the most popular evaluation metrics in facial landmark detection benchmark. However, the commonly used loss functions (L1 and L2) are not designed to optimize NME directly, and thus there might be a gap between optimizing the distance losses for regressing the parameters of landmark coordinates and minimizing this metric value. In this paper, we will try to address this issue, and propose a novel loss function named Enhanced Normalized Mean Error (ENME) loss, which will consider both the final metric and the attention mechanism for different NME intervals. In order to evaluate the effectiveness of our proposed loss, we design and train a light-weight regressing model we call Thin Residual Network (TRNet). Extensive experiments are conducted on three popular public datasets such as AFLW, COFW and challenging 300W, and the results show that TRNet when trained with the enhanced NME loss will exhibit better performance than the state of the art methods.
Supplementary material (ZIP)