Cargando…

Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices

BACKGROUND AND OBJECTIVES: The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be...

Descripción completa

Detalles Bibliográficos
Autores principales: Aichinger, P., Pernkopf, F., Schoentgen, J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6464090/
https://www.ncbi.nlm.nih.gov/pubmed/30996730
http://dx.doi.org/10.1016/j.bspc.2019.01.007
_version_ 1783410828746162176
author Aichinger, P.
Pernkopf, F.
Schoentgen, J.
author_facet Aichinger, P.
Pernkopf, F.
Schoentgen, J.
author_sort Aichinger, P.
collection PubMed
description BACKGROUND AND OBJECTIVES: The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be obtained from high-speed videos automatically. Aim of the study is to propose and test a procedure that automatically detects extra pulses, which may occur in voiced source signals of pathological voices in addition to cyclic pulses. MATERIAL AND METHODS: Glottal area waveforms (GAW) are synthesized and used to test a detector for extra pulses. Regarding synthesis, for each GAW a cyclic pulse train is mixed with an extra pulse train, and additive noise. The cyclic pulse trains are varied across GAWs in terms of fundamental frequency, pulse shape, and modulation noise, i.e., jitter and shimmer. The extra pulse trains are varied across GAWs in terms of the height of the extra pulses, and their rates of occurrence. The energy level of the additive noise is also varied. Regarding detection, first, the fundamental frequency is estimated jointly with the cyclic pulse train waveform, second, the modulation noise is estimated, and finally the extra pulse train waveform is estimated. Two versions of the detector are compared, i.e., one that parameterizes the shapes of the cyclic pulses, and one that uses unparameterized pulse shape estimates. Two corpora are used for testing, i.e., one with 100 GAWs containing random extra pulses, and one with 25 GAWs containing extra pulses in the closed phases of each glottal phase representing subharmonic voices. RESULTS AND DISCUSSION: With pulse shape parameterization (PSP) a maximum mean accuracy of 88.3% is achieved when detecting random extra pulses. Without PSP, the maximum mean accuracy reduces to 82.9%. Detection performance decreases if the energy level of additive noise is higher than −25 dB with respect to the energy of the cyclic pulse train, and if the irregularity strength exceeds 0.1. For bicyclic, i.e., subharmonic voices, the approach fails without PSP, whereas with PSP, a mean sensitivity of 87.4% is achieved for subharmonic voices. CONCLUSION: A synthesizer for GAWs containing extra pulses, and a detector for extra pulses are proposed. With PSP, favorable detector performance is observed for not too high levels of additive noise and irregularity strengths. In signals with high noise levels, the detector without PSP outperforms the other one. Detection of extra pulses fails if irregularity strength is large. For subharmonic voices PSP must be used.
format Online
Article
Text
id pubmed-6464090
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-64640902019-04-15 Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices Aichinger, P. Pernkopf, F. Schoentgen, J. Biomed Signal Process Control Article BACKGROUND AND OBJECTIVES: The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be obtained from high-speed videos automatically. Aim of the study is to propose and test a procedure that automatically detects extra pulses, which may occur in voiced source signals of pathological voices in addition to cyclic pulses. MATERIAL AND METHODS: Glottal area waveforms (GAW) are synthesized and used to test a detector for extra pulses. Regarding synthesis, for each GAW a cyclic pulse train is mixed with an extra pulse train, and additive noise. The cyclic pulse trains are varied across GAWs in terms of fundamental frequency, pulse shape, and modulation noise, i.e., jitter and shimmer. The extra pulse trains are varied across GAWs in terms of the height of the extra pulses, and their rates of occurrence. The energy level of the additive noise is also varied. Regarding detection, first, the fundamental frequency is estimated jointly with the cyclic pulse train waveform, second, the modulation noise is estimated, and finally the extra pulse train waveform is estimated. Two versions of the detector are compared, i.e., one that parameterizes the shapes of the cyclic pulses, and one that uses unparameterized pulse shape estimates. Two corpora are used for testing, i.e., one with 100 GAWs containing random extra pulses, and one with 25 GAWs containing extra pulses in the closed phases of each glottal phase representing subharmonic voices. RESULTS AND DISCUSSION: With pulse shape parameterization (PSP) a maximum mean accuracy of 88.3% is achieved when detecting random extra pulses. Without PSP, the maximum mean accuracy reduces to 82.9%. Detection performance decreases if the energy level of additive noise is higher than −25 dB with respect to the energy of the cyclic pulse train, and if the irregularity strength exceeds 0.1. For bicyclic, i.e., subharmonic voices, the approach fails without PSP, whereas with PSP, a mean sensitivity of 87.4% is achieved for subharmonic voices. CONCLUSION: A synthesizer for GAWs containing extra pulses, and a detector for extra pulses are proposed. With PSP, favorable detector performance is observed for not too high levels of additive noise and irregularity strengths. In signals with high noise levels, the detector without PSP outperforms the other one. Detection of extra pulses fails if irregularity strength is large. For subharmonic voices PSP must be used. 2019-04 /pmc/articles/PMC6464090/ /pubmed/30996730 http://dx.doi.org/10.1016/j.bspc.2019.01.007 Text en http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Aichinger, P.
Pernkopf, F.
Schoentgen, J.
Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title_full Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title_fullStr Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title_full_unstemmed Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title_short Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
title_sort detection of extra pulses in synthesized glottal area waveforms of dysphonic voices
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6464090/
https://www.ncbi.nlm.nih.gov/pubmed/30996730
http://dx.doi.org/10.1016/j.bspc.2019.01.007
work_keys_str_mv AT aichingerp detectionofextrapulsesinsynthesizedglottalareawaveformsofdysphonicvoices
AT pernkopff detectionofextrapulsesinsynthesizedglottalareawaveformsofdysphonicvoices
AT schoentgenj detectionofextrapulsesinsynthesizedglottalareawaveformsofdysphonicvoices