MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language

HAMID VAEZI JOZE (Microsoft), Oscar Koller (Microsoft)

Abstract
Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language due to the enormous cost of transcribing these unwritten languages. We propose the first real-life large-scale sign language data set comprising over 25.000 annotated videos, which we thoroughly evaluate with state-of-the-art methods from sign and related action recognition. Unlike the current state-of-the-art, the data set allows to investigate the generalization to unseen individuals (signer-independent test) in a realistic setting with over 200 signers. Previous work mostly deals with limited vocabulary tasks, while here, we cover a large class count of 1000 signs in challenging and unconstrained real-life recording conditions. We further propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition, outperforming the current state-of-the-art by a large margin. The data set is publicly available to the community.

DOI
10.5244/C.33.41
https://dx.doi.org/10.5244/C.33.41

Files
Paper (PDF)

BibTeX
@inproceedings{BMVC2019,
title={MS-ASL: A Large-Scale Data Set and Benchmark for Understanding American Sign Language},
author={HAMID VAEZI JOZE and Oscar Koller},
year={2019},
month={September},
pages={41.1--41.16},
articleno={41},
numpages={16},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},
doi={10.5244/C.33.41},
url={https://dx.doi.org/10.5244/C.33.41}
}