Cargando…

Spoken Language Identification Using Deep Learning

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and e...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Gundeep, Sharma, Sahil, Kumar, Vijay, Kaur, Manjit, Baz, Mohammed, Masud, Mehedi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478554/
https://www.ncbi.nlm.nih.gov/pubmed/34594371
http://dx.doi.org/10.1155/2021/5123671
_version_ 1784576082214649856
author Singh, Gundeep
Sharma, Sahil
Kumar, Vijay
Kaur, Manjit
Baz, Mohammed
Masud, Mehedi
author_facet Singh, Gundeep
Sharma, Sahil
Kumar, Vijay
Kaur, Manjit
Baz, Mohammed
Masud, Mehedi
author_sort Singh, Gundeep
collection PubMed
description The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.
format Online
Article
Text
id pubmed-8478554
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-84785542021-09-29 Spoken Language Identification Using Deep Learning Singh, Gundeep Sharma, Sahil Kumar, Vijay Kaur, Manjit Baz, Mohammed Masud, Mehedi Comput Intell Neurosci Research Article The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%. Hindawi 2021-09-20 /pmc/articles/PMC8478554/ /pubmed/34594371 http://dx.doi.org/10.1155/2021/5123671 Text en Copyright © 2021 Gundeep Singh et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Singh, Gundeep
Sharma, Sahil
Kumar, Vijay
Kaur, Manjit
Baz, Mohammed
Masud, Mehedi
Spoken Language Identification Using Deep Learning
title Spoken Language Identification Using Deep Learning
title_full Spoken Language Identification Using Deep Learning
title_fullStr Spoken Language Identification Using Deep Learning
title_full_unstemmed Spoken Language Identification Using Deep Learning
title_short Spoken Language Identification Using Deep Learning
title_sort spoken language identification using deep learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478554/
https://www.ncbi.nlm.nih.gov/pubmed/34594371
http://dx.doi.org/10.1155/2021/5123671
work_keys_str_mv AT singhgundeep spokenlanguageidentificationusingdeeplearning
AT sharmasahil spokenlanguageidentificationusingdeeplearning
AT kumarvijay spokenlanguageidentificationusingdeeplearning
AT kaurmanjit spokenlanguageidentificationusingdeeplearning
AT bazmohammed spokenlanguageidentificationusingdeeplearning
AT masudmehedi spokenlanguageidentificationusingdeeplearning