Cargando…

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each vi...

Descripción completa

Detalles Bibliográficos
Autores principales: Fehling, Mona Kirstin, Grosch, Fabian, Schuster, Maria Elke, Schick, Bernhard, Lohscheller, Jörg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7010264/
https://www.ncbi.nlm.nih.gov/pubmed/32040514
http://dx.doi.org/10.1371/journal.pone.0227791
_version_ 1783495849650683904
author Fehling, Mona Kirstin
Grosch, Fabian
Schuster, Maria Elke
Schick, Bernhard
Lohscheller, Jörg
author_facet Fehling, Mona Kirstin
Grosch, Fabian
Schuster, Maria Elke
Schick, Bernhard
Lohscheller, Jörg
author_sort Fehling, Mona Kirstin
collection PubMed
description The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.
format Online
Article
Text
id pubmed-7010264
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70102642020-02-21 Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network Fehling, Mona Kirstin Grosch, Fabian Schuster, Maria Elke Schick, Bernhard Lohscheller, Jörg PLoS One Research Article The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future. Public Library of Science 2020-02-10 /pmc/articles/PMC7010264/ /pubmed/32040514 http://dx.doi.org/10.1371/journal.pone.0227791 Text en © 2020 Fehling et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Fehling, Mona Kirstin
Grosch, Fabian
Schuster, Maria Elke
Schick, Bernhard
Lohscheller, Jörg
Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title_full Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title_fullStr Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title_full_unstemmed Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title_short Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network
title_sort fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7010264/
https://www.ncbi.nlm.nih.gov/pubmed/32040514
http://dx.doi.org/10.1371/journal.pone.0227791
work_keys_str_mv AT fehlingmonakirstin fullyautomaticsegmentationofglottisandvocalfoldsinendoscopiclaryngealhighspeedvideosusingadeepconvolutionallstmnetwork
AT groschfabian fullyautomaticsegmentationofglottisandvocalfoldsinendoscopiclaryngealhighspeedvideosusingadeepconvolutionallstmnetwork
AT schustermariaelke fullyautomaticsegmentationofglottisandvocalfoldsinendoscopiclaryngealhighspeedvideosusingadeepconvolutionallstmnetwork
AT schickbernhard fullyautomaticsegmentationofglottisandvocalfoldsinendoscopiclaryngealhighspeedvideosusingadeepconvolutionallstmnetwork
AT lohschellerjorg fullyautomaticsegmentationofglottisandvocalfoldsinendoscopiclaryngealhighspeedvideosusingadeepconvolutionallstmnetwork