Cargando…

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zazo, Ruben, Lozano-Diez, Alicia, Gonzalez-Dominguez, Javier, T. Toledano, Doroteo, Gonzalez-Rodriguez, Joaquin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4732772/ https://www.ncbi.nlm.nih.gov/pubmed/26824467 http://dx.doi.org/10.1371/journal.pone.0146917

_version_	1782412755891912704
author	Zazo, Ruben Lozano-Diez, Alicia Gonzalez-Dominguez, Javier T. Toledano, Doroteo Gonzalez-Rodriguez, Joaquin
author_facet	Zazo, Ruben Lozano-Diez, Alicia Gonzalez-Dominguez, Javier T. Toledano, Doroteo Gonzalez-Rodriguez, Joaquin
author_sort	Zazo, Ruben
collection	PubMed
description	Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.
format	Online Article Text
id	pubmed-4732772
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-47327722016-02-04 Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks Zazo, Ruben Lozano-Diez, Alicia Gonzalez-Dominguez, Javier T. Toledano, Doroteo Gonzalez-Rodriguez, Joaquin PLoS One Research Article Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (∼3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved. Public Library of Science 2016-01-29 /pmc/articles/PMC4732772/ /pubmed/26824467 http://dx.doi.org/10.1371/journal.pone.0146917 Text en © 2016 Zazo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zazo, Ruben Lozano-Diez, Alicia Gonzalez-Dominguez, Javier T. Toledano, Doroteo Gonzalez-Rodriguez, Joaquin Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title	Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title_full	Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title_fullStr	Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title_full_unstemmed	Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title_short	Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks
title_sort	language identification in short utterances using long short-term memory (lstm) recurrent neural networks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4732772/ https://www.ncbi.nlm.nih.gov/pubmed/26824467 http://dx.doi.org/10.1371/journal.pone.0146917
work_keys_str_mv	AT zazoruben languageidentificationinshortutterancesusinglongshorttermmemorylstmrecurrentneuralnetworks AT lozanodiezalicia languageidentificationinshortutterancesusinglongshorttermmemorylstmrecurrentneuralnetworks AT gonzalezdominguezjavier languageidentificationinshortutterancesusinglongshorttermmemorylstmrecurrentneuralnetworks AT ttoledanodoroteo languageidentificationinshortutterancesusinglongshorttermmemorylstmrecurrentneuralnetworks AT gonzalezrodriguezjoaquin languageidentificationinshortutterancesusinglongshorttermmemorylstmrecurrentneuralnetworks

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Ejemplares similares