Cargando…

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

In this study, we propose a deep learning framework and a self-supervision scheme for video-based surgical gesture recognition. The proposed framework is modular. First, a 3D convolutional network extracts feature vectors from video clips for encoding spatial and short-term temporal features. Second...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gazis, Athanasios, Karaiskos, Pantelis, Loukas, Constantinos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9774918/ https://www.ncbi.nlm.nih.gov/pubmed/36550943 http://dx.doi.org/10.3390/bioengineering9120737

_version_	1784855516817653760
author	Gazis, Athanasios Karaiskos, Pantelis Loukas, Constantinos
author_facet	Gazis, Athanasios Karaiskos, Pantelis Loukas, Constantinos
author_sort	Gazis, Athanasios
collection	PubMed
description	In this study, we propose a deep learning framework and a self-supervision scheme for video-based surgical gesture recognition. The proposed framework is modular. First, a 3D convolutional network extracts feature vectors from video clips for encoding spatial and short-term temporal features. Second, the feature vectors are fed into a transformer network for capturing long-term temporal dependencies. Two main models are proposed, based on the backbone framework: C3DTrans (supervised) and SSC3DTrans (self-supervised). The dataset consisted of 80 videos from two basic laparoscopic tasks: peg transfer (PT) and knot tying (KT). To examine the potential of self-supervision, the models were trained on 60% and 100% of the annotated dataset. In addition, the best-performing model was evaluated on the JIGSAWS robotic surgery dataset. The best model (C3DTrans) achieves an accuracy of 88.0%, a 95.2% clip level, and 97.5% and 97.9% (gesture level), for PT and KT, respectively. The SSC3DTrans performed similar to C3DTrans when training on 60% of the annotated dataset (about 84% and 93% clip-level accuracies for PT and KT, respectively). The performance of C3DTrans on JIGSAWS was close to 76% accuracy, which was similar to or higher than prior techniques based on a single video stream, no additional video training, and online processing.
format	Online Article Text
id	pubmed-9774918
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97749182022-12-23 Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning Gazis, Athanasios Karaiskos, Pantelis Loukas, Constantinos Bioengineering (Basel) Article In this study, we propose a deep learning framework and a self-supervision scheme for video-based surgical gesture recognition. The proposed framework is modular. First, a 3D convolutional network extracts feature vectors from video clips for encoding spatial and short-term temporal features. Second, the feature vectors are fed into a transformer network for capturing long-term temporal dependencies. Two main models are proposed, based on the backbone framework: C3DTrans (supervised) and SSC3DTrans (self-supervised). The dataset consisted of 80 videos from two basic laparoscopic tasks: peg transfer (PT) and knot tying (KT). To examine the potential of self-supervision, the models were trained on 60% and 100% of the annotated dataset. In addition, the best-performing model was evaluated on the JIGSAWS robotic surgery dataset. The best model (C3DTrans) achieves an accuracy of 88.0%, a 95.2% clip level, and 97.5% and 97.9% (gesture level), for PT and KT, respectively. The SSC3DTrans performed similar to C3DTrans when training on 60% of the annotated dataset (about 84% and 93% clip-level accuracies for PT and KT, respectively). The performance of C3DTrans on JIGSAWS was close to 76% accuracy, which was similar to or higher than prior techniques based on a single video stream, no additional video training, and online processing. MDPI 2022-11-29 /pmc/articles/PMC9774918/ /pubmed/36550943 http://dx.doi.org/10.3390/bioengineering9120737 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Gazis, Athanasios Karaiskos, Pantelis Loukas, Constantinos Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title	Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title_full	Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title_fullStr	Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title_full_unstemmed	Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title_short	Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
title_sort	surgical gesture recognition in laparoscopic tasks based on the transformer network and self-supervised learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9774918/ https://www.ncbi.nlm.nih.gov/pubmed/36550943 http://dx.doi.org/10.3390/bioengineering9120737
work_keys_str_mv	AT gazisathanasios surgicalgesturerecognitioninlaparoscopictasksbasedonthetransformernetworkandselfsupervisedlearning AT karaiskospantelis surgicalgesturerecognitioninlaparoscopictasksbasedonthetransformernetworkandselfsupervisedlearning AT loukasconstantinos surgicalgesturerecognitioninlaparoscopictasksbasedonthetransformernetworkandselfsupervisedlearning

Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning

Ejemplares similares