Cargando…

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children’s speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a s...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuo, Yao-Ming, Ruan, Shanq-Jang, Chen, Yu-Chin, Tu, Ya-Wen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324778/
https://www.ncbi.nlm.nih.gov/pubmed/35883979
http://dx.doi.org/10.3390/children9070996
_version_ 1784756882676645888
author Kuo, Yao-Ming
Ruan, Shanq-Jang
Chen, Yu-Chin
Tu, Ya-Wen
author_facet Kuo, Yao-Ming
Ruan, Shanq-Jang
Chen, Yu-Chin
Tu, Ya-Wen
author_sort Kuo, Yao-Ming
collection PubMed
description This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children’s speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3–6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech–language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system’s ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent.
format Online
Article
Text
id pubmed-9324778
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93247782022-07-27 Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders Kuo, Yao-Ming Ruan, Shanq-Jang Chen, Yu-Chin Tu, Ya-Wen Children (Basel) Article This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children’s speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3–6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech–language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system’s ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent. MDPI 2022-07-01 /pmc/articles/PMC9324778/ /pubmed/35883979 http://dx.doi.org/10.3390/children9070996 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kuo, Yao-Ming
Ruan, Shanq-Jang
Chen, Yu-Chin
Tu, Ya-Wen
Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title_full Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title_fullStr Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title_full_unstemmed Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title_short Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders
title_sort deep-learning-based automated classification of chinese speech sound disorders
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9324778/
https://www.ncbi.nlm.nih.gov/pubmed/35883979
http://dx.doi.org/10.3390/children9070996
work_keys_str_mv AT kuoyaoming deeplearningbasedautomatedclassificationofchinesespeechsounddisorders
AT ruanshanqjang deeplearningbasedautomatedclassificationofchinesespeechsounddisorders
AT chenyuchin deeplearningbasedautomatedclassificationofchinesespeechsounddisorders
AT tuyawen deeplearningbasedautomatedclassificationofchinesespeechsounddisorders