Cargando…
Ar-DAD: Arabic diversified audio dataset
The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. There...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689366/ https://www.ncbi.nlm.nih.gov/pubmed/33294506 http://dx.doi.org/10.1016/j.dib.2020.106503 |
_version_ | 1783613851849195520 |
---|---|
author | Lataifeh, Mohammed Elnagar, Ashraf |
author_facet | Lataifeh, Mohammed Elnagar, Ashraf |
author_sort | Lataifeh, Mohammed |
collection | PubMed |
description | The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset. |
format | Online Article Text |
id | pubmed-7689366 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-76893662020-12-07 Ar-DAD: Arabic diversified audio dataset Lataifeh, Mohammed Elnagar, Ashraf Data Brief Data Article The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset. Elsevier 2020-11-07 /pmc/articles/PMC7689366/ /pubmed/33294506 http://dx.doi.org/10.1016/j.dib.2020.106503 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Lataifeh, Mohammed Elnagar, Ashraf Ar-DAD: Arabic diversified audio dataset |
title | Ar-DAD: Arabic diversified audio dataset |
title_full | Ar-DAD: Arabic diversified audio dataset |
title_fullStr | Ar-DAD: Arabic diversified audio dataset |
title_full_unstemmed | Ar-DAD: Arabic diversified audio dataset |
title_short | Ar-DAD: Arabic diversified audio dataset |
title_sort | ar-dad: arabic diversified audio dataset |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689366/ https://www.ncbi.nlm.nih.gov/pubmed/33294506 http://dx.doi.org/10.1016/j.dib.2020.106503 |
work_keys_str_mv | AT lataifehmohammed ardadarabicdiversifiedaudiodataset AT elnagarashraf ardadarabicdiversifiedaudiodataset |