Cargando…

End‐to‐end deep learning classification of vocal pathology using stacked vowels

OBJECTIVES: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, George S., Hodges, Jordan M., Yu, Jingzhi, Sung, C. Kwang, Erickson‐DiRenzo, Elizabeth, Doyle, Philip C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601590/
https://www.ncbi.nlm.nih.gov/pubmed/37899847
http://dx.doi.org/10.1002/lio2.1144
_version_ 1785126227311329280
author Liu, George S.
Hodges, Jordan M.
Yu, Jingzhi
Sung, C. Kwang
Erickson‐DiRenzo, Elizabeth
Doyle, Philip C.
author_facet Liu, George S.
Hodges, Jordan M.
Yu, Jingzhi
Sung, C. Kwang
Erickson‐DiRenzo, Elizabeth
Doyle, Philip C.
author_sort Liu, George S.
collection PubMed
description OBJECTIVES: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology. METHODS: Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1‐dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously. RESULTS: For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class‐specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively). CONCLUSIONS: This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI‐driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach. LAY SUMMARY: AI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI‐driven screening of voice disorders. LEVEL OF EVIDENCE: 3
format Online
Article
Text
id pubmed-10601590
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-106015902023-10-27 End‐to‐end deep learning classification of vocal pathology using stacked vowels Liu, George S. Hodges, Jordan M. Yu, Jingzhi Sung, C. Kwang Erickson‐DiRenzo, Elizabeth Doyle, Philip C. Laryngoscope Investig Otolaryngol Laryngology, Speech and Language Science OBJECTIVES: Advances in artificial intelligence (AI) technology have increased the feasibility of classifying voice disorders using voice recordings as a screening tool. This work develops upon previous models that take in single vowel recordings by analyzing multiple vowel recordings simultaneously to enhance prediction of vocal pathology. METHODS: Voice samples from the Saarbruecken Voice Database, including three sustained vowels (/a/, /i/, /u/) from 687 healthy human participants and 334 dysphonic patients, were used to train 1‐dimensional convolutional neural network models for multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings. Three models were trained: (1) a baseline model that analyzed individual vowels in isolation, (2) a stacked vowel model that analyzed three vowels (/a/, /i/, /u/) in the neutral pitch simultaneously, and (3) a stacked pitch model that analyzed the /a/ vowel in three pitches (low, neutral, and high) simultaneously. RESULTS: For multiclass classification of healthy, hyperfunctional dysphonia, and laryngitis voice recordings, the stacked vowel model demonstrated higher performance compared with the baseline and stacked pitch models (F1 score 0.81 vs. 0.77 and 0.78, respectively). Specifically, the stacked vowel model achieved higher performance for class‐specific classification of hyperfunctional dysphonia voice samples compared with the baseline and stacked pitch models (F1 score 0.56 vs. 0.49 and 0.50, respectively). CONCLUSIONS: This study demonstrates the feasibility and potential of analyzing multiple sustained vowel recordings simultaneously to improve AI‐driven screening and classification of vocal pathology. The stacked vowel model architecture in particular offers promise to enhance such an approach. LAY SUMMARY: AI analysis of multiple vowel recordings can improve classification of voice pathologies compared with models using a single sustained vowel and offer a strategy to enhance AI‐driven screening of voice disorders. LEVEL OF EVIDENCE: 3 John Wiley & Sons, Inc. 2023-08-31 /pmc/articles/PMC10601590/ /pubmed/37899847 http://dx.doi.org/10.1002/lio2.1144 Text en © 2023 The Authors. Laryngoscope Investigative Otolaryngology published by Wiley Periodicals LLC on behalf of The Triological Society. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Laryngology, Speech and Language Science
Liu, George S.
Hodges, Jordan M.
Yu, Jingzhi
Sung, C. Kwang
Erickson‐DiRenzo, Elizabeth
Doyle, Philip C.
End‐to‐end deep learning classification of vocal pathology using stacked vowels
title End‐to‐end deep learning classification of vocal pathology using stacked vowels
title_full End‐to‐end deep learning classification of vocal pathology using stacked vowels
title_fullStr End‐to‐end deep learning classification of vocal pathology using stacked vowels
title_full_unstemmed End‐to‐end deep learning classification of vocal pathology using stacked vowels
title_short End‐to‐end deep learning classification of vocal pathology using stacked vowels
title_sort end‐to‐end deep learning classification of vocal pathology using stacked vowels
topic Laryngology, Speech and Language Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10601590/
https://www.ncbi.nlm.nih.gov/pubmed/37899847
http://dx.doi.org/10.1002/lio2.1144
work_keys_str_mv AT liugeorges endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels
AT hodgesjordanm endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels
AT yujingzhi endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels
AT sungckwang endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels
AT ericksondirenzoelizabeth endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels
AT doylephilipc endtoenddeeplearningclassificationofvocalpathologyusingstackedvowels