Cargando…
Spoken Language Identification Using Deep Learning
The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478554/ https://www.ncbi.nlm.nih.gov/pubmed/34594371 http://dx.doi.org/10.1155/2021/5123671 |
_version_ | 1784576082214649856 |
---|---|
author | Singh, Gundeep Sharma, Sahil Kumar, Vijay Kaur, Manjit Baz, Mohammed Masud, Mehedi |
author_facet | Singh, Gundeep Sharma, Sahil Kumar, Vijay Kaur, Manjit Baz, Mohammed Masud, Mehedi |
author_sort | Singh, Gundeep |
collection | PubMed |
description | The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%. |
format | Online Article Text |
id | pubmed-8478554 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-84785542021-09-29 Spoken Language Identification Using Deep Learning Singh, Gundeep Sharma, Sahil Kumar, Vijay Kaur, Manjit Baz, Mohammed Masud, Mehedi Comput Intell Neurosci Research Article The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%. Hindawi 2021-09-20 /pmc/articles/PMC8478554/ /pubmed/34594371 http://dx.doi.org/10.1155/2021/5123671 Text en Copyright © 2021 Gundeep Singh et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Singh, Gundeep Sharma, Sahil Kumar, Vijay Kaur, Manjit Baz, Mohammed Masud, Mehedi Spoken Language Identification Using Deep Learning |
title | Spoken Language Identification Using Deep Learning |
title_full | Spoken Language Identification Using Deep Learning |
title_fullStr | Spoken Language Identification Using Deep Learning |
title_full_unstemmed | Spoken Language Identification Using Deep Learning |
title_short | Spoken Language Identification Using Deep Learning |
title_sort | spoken language identification using deep learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8478554/ https://www.ncbi.nlm.nih.gov/pubmed/34594371 http://dx.doi.org/10.1155/2021/5123671 |
work_keys_str_mv | AT singhgundeep spokenlanguageidentificationusingdeeplearning AT sharmasahil spokenlanguageidentificationusingdeeplearning AT kumarvijay spokenlanguageidentificationusingdeeplearning AT kaurmanjit spokenlanguageidentificationusingdeeplearning AT bazmohammed spokenlanguageidentificationusingdeeplearning AT masudmehedi spokenlanguageidentificationusingdeeplearning |