Cargando…

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

BACKGROUND: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. OBJECTIVE: This study sought to detect...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Hao-Chun, Chang, Shyue-Yih, Wang, Chuen-Heng, Li, Kai-Jun, Cho, Hsiao-Yun, Chen, Yi-Ting, Lu, Chang-Jung, Tsai, Tzu-Pei, Lee, Oscar Kuang-Sheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8241431/
https://www.ncbi.nlm.nih.gov/pubmed/34100770
http://dx.doi.org/10.2196/25247
_version_ 1783715410332352512
author Hu, Hao-Chun
Chang, Shyue-Yih
Wang, Chuen-Heng
Li, Kai-Jun
Cho, Hsiao-Yun
Chen, Yi-Ting
Lu, Chang-Jung
Tsai, Tzu-Pei
Lee, Oscar Kuang-Sheng
author_facet Hu, Hao-Chun
Chang, Shyue-Yih
Wang, Chuen-Heng
Li, Kai-Jun
Cho, Hsiao-Yun
Chen, Yi-Ting
Lu, Chang-Jung
Tsai, Tzu-Pei
Lee, Oscar Kuang-Sheng
author_sort Hu, Hao-Chun
collection PubMed
description BACKGROUND: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. OBJECTIVE: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. METHODS: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. RESULTS: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. CONCLUSIONS: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.
format Online
Article
Text
id pubmed-8241431
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-82414312021-07-09 Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study Hu, Hao-Chun Chang, Shyue-Yih Wang, Chuen-Heng Li, Kai-Jun Cho, Hsiao-Yun Chen, Yi-Ting Lu, Chang-Jung Tsai, Tzu-Pei Lee, Oscar Kuang-Sheng J Med Internet Res Original Paper BACKGROUND: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. OBJECTIVE: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. METHODS: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. RESULTS: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. CONCLUSIONS: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies. JMIR Publications 2021-06-08 /pmc/articles/PMC8241431/ /pubmed/34100770 http://dx.doi.org/10.2196/25247 Text en ©Hao-Chun Hu, Shyue-Yih Chang, Chuen-Heng Wang, Kai-Jun Li, Hsiao-Yun Cho, Yi-Ting Chen, Chang-Jung Lu, Tzu-Pei Tsai, Oscar Kuang-Sheng Lee. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 08.06.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Hu, Hao-Chun
Chang, Shyue-Yih
Wang, Chuen-Heng
Li, Kai-Jun
Cho, Hsiao-Yun
Chen, Yi-Ting
Lu, Chang-Jung
Tsai, Tzu-Pei
Lee, Oscar Kuang-Sheng
Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title_full Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title_fullStr Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title_full_unstemmed Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title_short Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study
title_sort deep learning application for vocal fold disease prediction through voice recognition: preliminary development study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8241431/
https://www.ncbi.nlm.nih.gov/pubmed/34100770
http://dx.doi.org/10.2196/25247
work_keys_str_mv AT huhaochun deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT changshyueyih deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT wangchuenheng deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT likaijun deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT chohsiaoyun deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT chenyiting deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT luchangjung deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT tsaitzupei deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy
AT leeoscarkuangsheng deeplearningapplicationforvocalfolddiseasepredictionthroughvoicerecognitionpreliminarydevelopmentstudy