Cargando…

End-to-End Lip-Reading Open Cloud-Based Speech Architecture

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jeon, Sanghun, Kim, Mun Sang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9029225/ https://www.ncbi.nlm.nih.gov/pubmed/35458932 http://dx.doi.org/10.3390/s22082938

_version_	1784691824562012160
author	Jeon, Sanghun Kim, Mun Sang
author_facet	Jeon, Sanghun Kim, Mun Sang
author_sort	Jeon, Sanghun
collection	PubMed
description	Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in various environments. Several OCSR APIs, including Google, Microsoft, Amazon, and Naver, were evaluated using the Google Voice Command Dataset v2 to obtain the optimum performance. Based on performance, the Microsoft API was integrated with Google’s trained word2vec model to enhance the keywords with more complete semantic information. The extracted word vector was integrated with the proposed lip-reading architecture for audio-visual speech recognition. Three forms of convolutional neural networks (3D CNN, 3D dense connection CNN, and multilayer 3D CNN) were used in the proposed lip-reading architecture. Vectors extracted from API and vision were classified after concatenation. The proposed architecture enhanced the OCSR API average accuracy rate by 14.42% using standard ASR evaluation measures along with the signal-to-noise ratio. The proposed model exhibits improved performance in various noise settings, increasing the dependability of OCSR APIs for practical applications.
format	Online Article Text
id	pubmed-9029225
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-90292252022-04-23 End-to-End Lip-Reading Open Cloud-Based Speech Architecture Jeon, Sanghun Kim, Mun Sang Sensors (Basel) Article Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech recognition application programming interfaces (OCSR APIs). Noise-robust ASRs for application in different environments are being developed. This study proposes noise-robust OCSR APIs based on an end-to-end lip-reading architecture for practical applications in various environments. Several OCSR APIs, including Google, Microsoft, Amazon, and Naver, were evaluated using the Google Voice Command Dataset v2 to obtain the optimum performance. Based on performance, the Microsoft API was integrated with Google’s trained word2vec model to enhance the keywords with more complete semantic information. The extracted word vector was integrated with the proposed lip-reading architecture for audio-visual speech recognition. Three forms of convolutional neural networks (3D CNN, 3D dense connection CNN, and multilayer 3D CNN) were used in the proposed lip-reading architecture. Vectors extracted from API and vision were classified after concatenation. The proposed architecture enhanced the OCSR API average accuracy rate by 14.42% using standard ASR evaluation measures along with the signal-to-noise ratio. The proposed model exhibits improved performance in various noise settings, increasing the dependability of OCSR APIs for practical applications. MDPI 2022-04-12 /pmc/articles/PMC9029225/ /pubmed/35458932 http://dx.doi.org/10.3390/s22082938 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Jeon, Sanghun Kim, Mun Sang End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title	End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title_full	End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title_fullStr	End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title_full_unstemmed	End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title_short	End-to-End Lip-Reading Open Cloud-Based Speech Architecture
title_sort	end-to-end lip-reading open cloud-based speech architecture
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9029225/ https://www.ncbi.nlm.nih.gov/pubmed/35458932 http://dx.doi.org/10.3390/s22082938
work_keys_str_mv	AT jeonsanghun endtoendlipreadingopencloudbasedspeecharchitecture AT kimmunsang endtoendlipreadingopencloudbasedspeecharchitecture

End-to-End Lip-Reading Open Cloud-Based Speech Architecture

Ejemplares similares