Cargando…

SIG-Former: monocular surgical instruction generation with transformers

PURPOSE: Automatic surgical instruction generation is a crucial part for intra-operative surgical assistance. However, understanding and translating surgical activities into human-like sentences are particularly challenging due to the complexity of surgical environment and the modal gap between imag...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Jinglu, Nie, Yinyu, Chang, Jian, Zhang, Jian Jun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652298/ https://www.ncbi.nlm.nih.gov/pubmed/35900645 http://dx.doi.org/10.1007/s11548-022-02718-9

_version_	1784828438547267584
author	Zhang, Jinglu Nie, Yinyu Chang, Jian Zhang, Jian Jun
author_facet	Zhang, Jinglu Nie, Yinyu Chang, Jian Zhang, Jian Jun
author_sort	Zhang, Jinglu
collection	PubMed
description	PURPOSE: Automatic surgical instruction generation is a crucial part for intra-operative surgical assistance. However, understanding and translating surgical activities into human-like sentences are particularly challenging due to the complexity of surgical environment and the modal gap between images and natural languages. To this end, we introduce SIG-Former, a transformer-backboned generation network to predict surgical instructions from monocular RGB images. METHODS: Taking a surgical image as input, we first extract its visual attentive feature map with a fine-tuned ResNet-101 model, followed by transformer attention blocks to correspondingly model its visual representation, text embedding and visual–textual relational feature. To tackle the loss-metric inconsistency between training and inference in sequence generation, we additionally apply a self-critical reinforcement learning approach to directly optimize the CIDEr score after regular training. RESULTS: We validate our proposed method on DAISI dataset, which contains 290 clinical procedures from diverse medical subjects. Extensive experiments demonstrate that our method outperforms the baselines and achieves promising performance on both quantitative and qualitative evaluations. CONCLUSION: Our experiments demonstrate that SIG-Former is capable of mapping dependencies between visual feature and textual information. Besides, surgical instruction generation is still at its preliminary stage. Future works include collecting large clinical dataset, annotating more reference instructions and preparing pre-trained models on medical images.
format	Online Article Text
id	pubmed-9652298
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-96522982022-11-15 SIG-Former: monocular surgical instruction generation with transformers Zhang, Jinglu Nie, Yinyu Chang, Jian Zhang, Jian Jun Int J Comput Assist Radiol Surg Original Article PURPOSE: Automatic surgical instruction generation is a crucial part for intra-operative surgical assistance. However, understanding and translating surgical activities into human-like sentences are particularly challenging due to the complexity of surgical environment and the modal gap between images and natural languages. To this end, we introduce SIG-Former, a transformer-backboned generation network to predict surgical instructions from monocular RGB images. METHODS: Taking a surgical image as input, we first extract its visual attentive feature map with a fine-tuned ResNet-101 model, followed by transformer attention blocks to correspondingly model its visual representation, text embedding and visual–textual relational feature. To tackle the loss-metric inconsistency between training and inference in sequence generation, we additionally apply a self-critical reinforcement learning approach to directly optimize the CIDEr score after regular training. RESULTS: We validate our proposed method on DAISI dataset, which contains 290 clinical procedures from diverse medical subjects. Extensive experiments demonstrate that our method outperforms the baselines and achieves promising performance on both quantitative and qualitative evaluations. CONCLUSION: Our experiments demonstrate that SIG-Former is capable of mapping dependencies between visual feature and textual information. Besides, surgical instruction generation is still at its preliminary stage. Future works include collecting large clinical dataset, annotating more reference instructions and preparing pre-trained models on medical images. Springer International Publishing 2022-07-28 2022 /pmc/articles/PMC9652298/ /pubmed/35900645 http://dx.doi.org/10.1007/s11548-022-02718-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Original Article Zhang, Jinglu Nie, Yinyu Chang, Jian Zhang, Jian Jun SIG-Former: monocular surgical instruction generation with transformers
title	SIG-Former: monocular surgical instruction generation with transformers
title_full	SIG-Former: monocular surgical instruction generation with transformers
title_fullStr	SIG-Former: monocular surgical instruction generation with transformers
title_full_unstemmed	SIG-Former: monocular surgical instruction generation with transformers
title_short	SIG-Former: monocular surgical instruction generation with transformers
title_sort	sig-former: monocular surgical instruction generation with transformers
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9652298/ https://www.ncbi.nlm.nih.gov/pubmed/35900645 http://dx.doi.org/10.1007/s11548-022-02718-9
work_keys_str_mv	AT zhangjinglu sigformermonocularsurgicalinstructiongenerationwithtransformers AT nieyinyu sigformermonocularsurgicalinstructiongenerationwithtransformers AT changjian sigformermonocularsurgicalinstructiongenerationwithtransformers AT zhangjianjun sigformermonocularsurgicalinstructiongenerationwithtransformers

SIG-Former: monocular surgical instruction generation with transformers

Ejemplares similares