Cargando…
TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation
Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce “TranStutter”, a pioneering Con...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575465/ https://www.ncbi.nlm.nih.gov/pubmed/37836863 http://dx.doi.org/10.3390/s23198033 |
_version_ | 1785120928789692416 |
---|---|
author | Basak, Krishna Mishra, Nilamadhab Chang, Hsien-Tsung |
author_facet | Basak, Krishna Mishra, Nilamadhab Chang, Hsien-Tsung |
author_sort | Basak, Krishna |
collection | PubMed |
description | Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce “TranStutter”, a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter’s performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter’s significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders. |
format | Online Article Text |
id | pubmed-10575465 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-105754652023-10-14 TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation Basak, Krishna Mishra, Nilamadhab Chang, Hsien-Tsung Sensors (Basel) Article Stuttering, a prevalent neurodevelopmental disorder, profoundly affects fluent speech, causing involuntary interruptions and recurrent sound patterns. This study addresses the critical need for the accurate classification of stuttering types. The researchers introduce “TranStutter”, a pioneering Convolution-free Transformer-based DL model, designed to excel in speech disfluency classification. Unlike conventional methods, TranStutter leverages Multi-Head Self-Attention and Positional Encoding to capture intricate temporal patterns, yielding superior accuracy. In this study, the researchers employed two benchmark datasets: the Stuttering Events in Podcasts Dataset (SEP-28k) and the FluencyBank Interview Subset. SEP-28k comprises 28,177 audio clips from podcasts, meticulously annotated into distinct dysfluent and non-dysfluent labels, including Block (BL), Prolongation (PR), Sound Repetition (SR), Word Repetition (WR), and Interjection (IJ). The FluencyBank subset encompasses 4144 audio clips from 32 People Who Stutter (PWS), providing a diverse set of speech samples. TranStutter’s performance was assessed rigorously. On SEP-28k, the model achieved an impressive accuracy of 88.1%. Furthermore, on the FluencyBank dataset, TranStutter demonstrated its efficacy with an accuracy of 80.6%. These results highlight TranStutter’s significant potential in revolutionizing the diagnosis and treatment of stuttering, thereby contributing to the evolving landscape of speech pathology and neurodevelopmental research. The innovative integration of Multi-Head Self-Attention and Positional Encoding distinguishes TranStutter, enabling it to discern nuanced disfluencies with unparalleled precision. This novel approach represents a substantial leap forward in the field of speech pathology, promising more accurate diagnostics and targeted interventions for individuals with stuttering disorders. MDPI 2023-09-22 /pmc/articles/PMC10575465/ /pubmed/37836863 http://dx.doi.org/10.3390/s23198033 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Basak, Krishna Mishra, Nilamadhab Chang, Hsien-Tsung TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title | TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title_full | TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title_fullStr | TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title_full_unstemmed | TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title_short | TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation |
title_sort | transtutter: a convolution-free transformer-based deep learning method to classify stuttered speech using 2d mel-spectrogram visualization and attention-based feature representation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575465/ https://www.ncbi.nlm.nih.gov/pubmed/37836863 http://dx.doi.org/10.3390/s23198033 |
work_keys_str_mv | AT basakkrishna transtutteraconvolutionfreetransformerbaseddeeplearningmethodtoclassifystutteredspeechusing2dmelspectrogramvisualizationandattentionbasedfeaturerepresentation AT mishranilamadhab transtutteraconvolutionfreetransformerbaseddeeplearningmethodtoclassifystutteredspeechusing2dmelspectrogramvisualizationandattentionbasedfeaturerepresentation AT changhsientsung transtutteraconvolutionfreetransformerbaseddeeplearningmethodtoclassifystutteredspeechusing2dmelspectrogramvisualizationandattentionbasedfeaturerepresentation |