Cargando…

Spatial position constraint for unsupervised learning of speech representations

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification...

Descripción completa

Detalles Bibliográficos
Autores principales:	Humayun, Mohammad Ali, Yassin, Hayati, Abas, Pg Emeroylariffion
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323719/ https://www.ncbi.nlm.nih.gov/pubmed/34395866 http://dx.doi.org/10.7717/peerj-cs.650

_version_	1783731297993097216
author	Humayun, Mohammad Ali Yassin, Hayati Abas, Pg Emeroylariffion
author_facet	Humayun, Mohammad Ali Yassin, Hayati Abas, Pg Emeroylariffion
author_sort	Humayun, Mohammad Ali
collection	PubMed
description	The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.
format	Online Article Text
id	pubmed-8323719
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-83237192021-08-13 Spatial position constraint for unsupervised learning of speech representations Humayun, Mohammad Ali Yassin, Hayati Abas, Pg Emeroylariffion PeerJ Comput Sci Artificial Intelligence The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data. PeerJ Inc. 2021-07-21 /pmc/articles/PMC8323719/ /pubmed/34395866 http://dx.doi.org/10.7717/peerj-cs.650 Text en ©2021 Humayun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Humayun, Mohammad Ali Yassin, Hayati Abas, Pg Emeroylariffion Spatial position constraint for unsupervised learning of speech representations
title	Spatial position constraint for unsupervised learning of speech representations
title_full	Spatial position constraint for unsupervised learning of speech representations
title_fullStr	Spatial position constraint for unsupervised learning of speech representations
title_full_unstemmed	Spatial position constraint for unsupervised learning of speech representations
title_short	Spatial position constraint for unsupervised learning of speech representations
title_sort	spatial position constraint for unsupervised learning of speech representations
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323719/ https://www.ncbi.nlm.nih.gov/pubmed/34395866 http://dx.doi.org/10.7717/peerj-cs.650
work_keys_str_mv	AT humayunmohammadali spatialpositionconstraintforunsupervisedlearningofspeechrepresentations AT yassinhayati spatialpositionconstraintforunsupervisedlearningofspeechrepresentations AT abaspgemeroylariffion spatialpositionconstraintforunsupervisedlearningofspeechrepresentations

Spatial position constraint for unsupervised learning of speech representations

Ejemplares similares