Cargando…

Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping

Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Csapó, Tamás Gábor, Gosztolya, Gábor, Tóth, László, Shandiz, Amin Honarmandi, Markó, Alexandra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9696288/ https://www.ncbi.nlm.nih.gov/pubmed/36433196 http://dx.doi.org/10.3390/s22228601

_version_	1784838273344995328
author	Csapó, Tamás Gábor Gosztolya, Gábor Tóth, László Shandiz, Amin Honarmandi Markó, Alexandra
author_facet	Csapó, Tamás Gábor Gosztolya, Gábor Tóth, László Shandiz, Amin Honarmandi Markó, Alexandra
author_sort	Csapó, Tamás Gábor
collection	PubMed
description	Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a “silent speech interface”, which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions.
format	Online Article Text
id	pubmed-9696288
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96962882022-11-26 Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping Csapó, Tamás Gábor Gosztolya, Gábor Tóth, László Shandiz, Amin Honarmandi Markó, Alexandra Sensors (Basel) Article Within speech processing, articulatory-to-acoustic mapping (AAM) methods can apply ultrasound tongue imaging (UTI) as an input. (Micro)convex transducers are mostly used, which provide a wedge-shape visual image. However, this process is optimized for the visual inspection of the human eye, and the signal is often post-processed by the equipment. With newer ultrasound equipment, now it is possible to gain access to the raw scanline data (i.e., ultrasound echo return) without any internal post-processing. In this study, we compared the raw scanline representation with the wedge-shaped processed UTI as the input for the residual network applied for AAM, and we also investigated the optimal size of the input image. We found no significant differences between the performance attained using the raw data and the wedge-shaped image extrapolated from it. We found the optimal pixel size to be 64 × 43 in the case of the raw scanline input, and 64 × 64 when transformed to a wedge. Therefore, it is not necessary to use the full original 64 × 842 pixels raw scanline, but a smaller image is enough. This allows for the building of smaller networks, and will be beneficial for the development of session and speaker-independent methods for practical applications. AAM systems have the target application of a “silent speech interface”, which could be helpful for the communication of the speaking-impaired, in military applications, or in extremely noisy conditions. MDPI 2022-11-08 /pmc/articles/PMC9696288/ /pubmed/36433196 http://dx.doi.org/10.3390/s22228601 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Csapó, Tamás Gábor Gosztolya, Gábor Tóth, László Shandiz, Amin Honarmandi Markó, Alexandra Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title	Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title_full	Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title_fullStr	Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title_full_unstemmed	Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title_short	Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping
title_sort	optimizing the ultrasound tongue image representation for residual network-based articulatory-to-acoustic mapping
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9696288/ https://www.ncbi.nlm.nih.gov/pubmed/36433196 http://dx.doi.org/10.3390/s22228601
work_keys_str_mv	AT csapotamasgabor optimizingtheultrasoundtongueimagerepresentationforresidualnetworkbasedarticulatorytoacousticmapping AT gosztolyagabor optimizingtheultrasoundtongueimagerepresentationforresidualnetworkbasedarticulatorytoacousticmapping AT tothlaszlo optimizingtheultrasoundtongueimagerepresentationforresidualnetworkbasedarticulatorytoacousticmapping AT shandizaminhonarmandi optimizingtheultrasoundtongueimagerepresentationforresidualnetworkbasedarticulatorytoacousticmapping AT markoalexandra optimizingtheultrasoundtongueimagerepresentationforresidualnetworkbasedarticulatorytoacousticmapping

Optimizing the Ultrasound Tongue Image Representation for Residual Network-Based Articulatory-to-Acoustic Mapping

Ejemplares similares