Cargando…

Domain Generalization for Language-Independent Automatic Speech Recognition

A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Heting, Ni, Junrui, Zhang, Yang, Qian, Kaizhi, Chang, Shiyu, Hasegawa-Johnson, Mark
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9133481/ https://www.ncbi.nlm.nih.gov/pubmed/35647534 http://dx.doi.org/10.3389/frai.2022.806274

_version_	1784713578542006272
author	Gao, Heting Ni, Junrui Zhang, Yang Qian, Kaizhi Chang, Shiyu Hasegawa-Johnson, Mark
author_facet	Gao, Heting Ni, Junrui Zhang, Yang Qian, Kaizhi Chang, Shiyu Hasegawa-Johnson, Mark
author_sort	Gao, Heting
collection	PubMed
description	A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.
format	Online Article Text
id	pubmed-9133481
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-91334812022-05-27 Domain Generalization for Language-Independent Automatic Speech Recognition Gao, Heting Ni, Junrui Zhang, Yang Qian, Kaizhi Chang, Shiyu Hasegawa-Johnson, Mark Front Artif Intell Artificial Intelligence A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition. Frontiers Media S.A. 2022-05-12 /pmc/articles/PMC9133481/ /pubmed/35647534 http://dx.doi.org/10.3389/frai.2022.806274 Text en Copyright © 2022 Gao, Ni, Zhang, Qian, Chang and Hasegawa-Johnson. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Gao, Heting Ni, Junrui Zhang, Yang Qian, Kaizhi Chang, Shiyu Hasegawa-Johnson, Mark Domain Generalization for Language-Independent Automatic Speech Recognition
title	Domain Generalization for Language-Independent Automatic Speech Recognition
title_full	Domain Generalization for Language-Independent Automatic Speech Recognition
title_fullStr	Domain Generalization for Language-Independent Automatic Speech Recognition
title_full_unstemmed	Domain Generalization for Language-Independent Automatic Speech Recognition
title_short	Domain Generalization for Language-Independent Automatic Speech Recognition
title_sort	domain generalization for language-independent automatic speech recognition
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9133481/ https://www.ncbi.nlm.nih.gov/pubmed/35647534 http://dx.doi.org/10.3389/frai.2022.806274
work_keys_str_mv	AT gaoheting domaingeneralizationforlanguageindependentautomaticspeechrecognition AT nijunrui domaingeneralizationforlanguageindependentautomaticspeechrecognition AT zhangyang domaingeneralizationforlanguageindependentautomaticspeechrecognition AT qiankaizhi domaingeneralizationforlanguageindependentautomaticspeechrecognition AT changshiyu domaingeneralizationforlanguageindependentautomaticspeechrecognition AT hasegawajohnsonmark domaingeneralizationforlanguageindependentautomaticspeechrecognition

Domain Generalization for Language-Independent Automatic Speech Recognition

Ejemplares similares