Cargando…

Ar-DAD: Arabic diversified audio dataset

The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. There...

Descripción completa

Detalles Bibliográficos
Autores principales: Lataifeh, Mohammed, Elnagar, Ashraf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689366/
https://www.ncbi.nlm.nih.gov/pubmed/33294506
http://dx.doi.org/10.1016/j.dib.2020.106503
_version_ 1783613851849195520
author Lataifeh, Mohammed
Elnagar, Ashraf
author_facet Lataifeh, Mohammed
Elnagar, Ashraf
author_sort Lataifeh, Mohammed
collection PubMed
description The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset.
format Online
Article
Text
id pubmed-7689366
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-76893662020-12-07 Ar-DAD: Arabic diversified audio dataset Lataifeh, Mohammed Elnagar, Ashraf Data Brief Data Article The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset. Elsevier 2020-11-07 /pmc/articles/PMC7689366/ /pubmed/33294506 http://dx.doi.org/10.1016/j.dib.2020.106503 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Lataifeh, Mohammed
Elnagar, Ashraf
Ar-DAD: Arabic diversified audio dataset
title Ar-DAD: Arabic diversified audio dataset
title_full Ar-DAD: Arabic diversified audio dataset
title_fullStr Ar-DAD: Arabic diversified audio dataset
title_full_unstemmed Ar-DAD: Arabic diversified audio dataset
title_short Ar-DAD: Arabic diversified audio dataset
title_sort ar-dad: arabic diversified audio dataset
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689366/
https://www.ncbi.nlm.nih.gov/pubmed/33294506
http://dx.doi.org/10.1016/j.dib.2020.106503
work_keys_str_mv AT lataifehmohammed ardadarabicdiversifiedaudiodataset
AT elnagarashraf ardadarabicdiversifiedaudiodataset