Cargando…

Learning how to robustly estimate camera pose in endoscopic videos

PURPOSE: Surgical scene understanding plays a critical role in the technology stack of tomorrow’s intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breath...

Descripción completa

Detalles Bibliográficos
Autores principales: Hayoz, Michel, Hahne, Christopher, Gallardo, Mathias, Candinas, Daniel, Kurmann, Thomas, Allan, Maximilian, Sznitman, Raphael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329609/
https://www.ncbi.nlm.nih.gov/pubmed/37184768
http://dx.doi.org/10.1007/s11548-023-02919-w
_version_ 1785070055618248704
author Hayoz, Michel
Hahne, Christopher
Gallardo, Mathias
Candinas, Daniel
Kurmann, Thomas
Allan, Maximilian
Sznitman, Raphael
author_facet Hayoz, Michel
Hahne, Christopher
Gallardo, Mathias
Candinas, Daniel
Kurmann, Thomas
Allan, Maximilian
Sznitman, Raphael
author_sort Hayoz, Michel
collection PubMed
description PURPOSE: Surgical scene understanding plays a critical role in the technology stack of tomorrow’s intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. METHOD: We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. RESULTS: Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. CONCLUSION: We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally invasive surgery. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11548-023-02919-w.
format Online
Article
Text
id pubmed-10329609
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-103296092023-07-10 Learning how to robustly estimate camera pose in endoscopic videos Hayoz, Michel Hahne, Christopher Gallardo, Mathias Candinas, Daniel Kurmann, Thomas Allan, Maximilian Sznitman, Raphael Int J Comput Assist Radiol Surg Original Article PURPOSE: Surgical scene understanding plays a critical role in the technology stack of tomorrow’s intervention-assisting systems in endoscopic surgeries. For this, tracking the endoscope pose is a key component, but remains challenging due to illumination conditions, deforming tissues and the breathing motion of organs. METHOD: We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. To do so, we train a Deep Declarative Network to take advantage of the expressiveness of deep learning and the robustness of a novel geometric-based optimization approach. We validate our approach on the publicly available SCARED dataset and introduce a new in vivo dataset, StereoMIS, which includes a wider spectrum of typically observed surgical settings. RESULTS: Our method outperforms state-of-the-art methods on average and more importantly, in difficult scenarios where tissue deformations and breathing motion are visible. We observed that our proposed weight mappings attenuate the contribution of pixels on ambiguous regions of the images, such as deforming tissues. CONCLUSION: We demonstrate the effectiveness of our solution to robustly estimate the camera pose in challenging endoscopic surgical scenes. Our contributions can be used to improve related tasks like simultaneous localization and mapping (SLAM) or 3D reconstruction, therefore advancing surgical scene understanding in minimally invasive surgery. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11548-023-02919-w. Springer International Publishing 2023-05-15 2023 /pmc/articles/PMC10329609/ /pubmed/37184768 http://dx.doi.org/10.1007/s11548-023-02919-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Article
Hayoz, Michel
Hahne, Christopher
Gallardo, Mathias
Candinas, Daniel
Kurmann, Thomas
Allan, Maximilian
Sznitman, Raphael
Learning how to robustly estimate camera pose in endoscopic videos
title Learning how to robustly estimate camera pose in endoscopic videos
title_full Learning how to robustly estimate camera pose in endoscopic videos
title_fullStr Learning how to robustly estimate camera pose in endoscopic videos
title_full_unstemmed Learning how to robustly estimate camera pose in endoscopic videos
title_short Learning how to robustly estimate camera pose in endoscopic videos
title_sort learning how to robustly estimate camera pose in endoscopic videos
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329609/
https://www.ncbi.nlm.nih.gov/pubmed/37184768
http://dx.doi.org/10.1007/s11548-023-02919-w
work_keys_str_mv AT hayozmichel learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT hahnechristopher learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT gallardomathias learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT candinasdaniel learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT kurmannthomas learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT allanmaximilian learninghowtorobustlyestimatecameraposeinendoscopicvideos
AT sznitmanraphael learninghowtorobustlyestimatecameraposeinendoscopicvideos