Working Hands: A Hand-Tool Assembly Dataset for Image Segmentation and Activity Mining

Roy Shilkrot (Stony Brook University), Supreeth Narasimhaswamy (Stony Brook University), Saif Vazir (Stony Brook University), Minh Hoai Nguyen (Stony Brook University)

Computer vision in manufacturing is a decades long effort into automatic inspection and verification of the work pieces, while visual recognition focusing on the human operators is becoming ever prominent. Semantic segmentation is an exemplary vision task that is key to enabling crucial assembly applications such as completion time tracking and manual process verification. However, focus on segmentation of human hands while performing complex tasks such as manual assembly is still lacking. Segmenting hands from tools, work pieces, background and other body parts is difficult because of self-occlusions and intricate hand grips and poses. In this paper we introduce Working Hands, a dataset of pixel-level annotated images of hands performing 13 different tool-based assembly tasks, from both real-world captures and virtual-world renderings, with RGB+D images from a high-resolution range camera and ray casting engine. Moreover, using the dataset, we can learn a generic Hand-Task Descriptor that is useful for retrieving hand images and video performing similar operations across different non-annotated datasets.


Paper (PDF)

title={Working Hands: A Hand-Tool Assembly Dataset for Image Segmentation and Activity Mining},
author={Roy Shilkrot and Supreeth Narasimhaswamy and Saif Vazir and Minh Hoai Nguyen},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Kirill Sidorov and Yulia Hicks},