Cargando…

Self-supervised representation learning for surgical activity recognition

Purpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expe...

Descripción completa

Detalles Bibliográficos
Autores principales: Paysan, Daniel, Haug, Luis, Bajka, Michael, Oelhafen, Markus, Buhmann, Joachim M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8589823/
https://www.ncbi.nlm.nih.gov/pubmed/34542839
http://dx.doi.org/10.1007/s11548-021-02493-z
Descripción
Sumario:Purpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way. Methods: We use self-supervised training of deep encoder–decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline. Results: Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder–decoder network on the task of predicting the remaining surgery progress. Conclusion: Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11548-021-02493-z.