Cargando…

YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos

Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and p...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Hung-Cuong, Nguyen, Thi-Hao, Scherer, Rafał, Le, Van-Hung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10058182/
https://www.ncbi.nlm.nih.gov/pubmed/36991971
http://dx.doi.org/10.3390/s23063255
Descripción
Sumario:Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the “You Only Live Once” (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is P = 97% with Thesh(IOU) = 0.5; HOI4D is P = 95% with Thesh(IOU) = 0.5; RehabHand is larger than 95% with Thesh(IOU) = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 × 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 × 640 pixels.