Cargando…
ANAD: Arabic news article dataset
In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415830/ https://www.ncbi.nlm.nih.gov/pubmed/37577410 http://dx.doi.org/10.1016/j.dib.2023.109460 |
_version_ | 1785087633061314560 |
---|---|
author | Altamimi, Mohammed Alayba, Abdulaziz M. |
author_facet | Altamimi, Mohammed Alayba, Abdulaziz M. |
author_sort | Altamimi, Mohammed |
collection | PubMed |
description | In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main. |
format | Online Article Text |
id | pubmed-10415830 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-104158302023-08-12 ANAD: Arabic news article dataset Altamimi, Mohammed Alayba, Abdulaziz M. Data Brief Data Article In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main. Elsevier 2023-07-29 /pmc/articles/PMC10415830/ /pubmed/37577410 http://dx.doi.org/10.1016/j.dib.2023.109460 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data Article Altamimi, Mohammed Alayba, Abdulaziz M. ANAD: Arabic news article dataset |
title | ANAD: Arabic news article dataset |
title_full | ANAD: Arabic news article dataset |
title_fullStr | ANAD: Arabic news article dataset |
title_full_unstemmed | ANAD: Arabic news article dataset |
title_short | ANAD: Arabic news article dataset |
title_sort | anad: arabic news article dataset |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415830/ https://www.ncbi.nlm.nih.gov/pubmed/37577410 http://dx.doi.org/10.1016/j.dib.2023.109460 |
work_keys_str_mv | AT altamimimohammed anadarabicnewsarticledataset AT alaybaabdulazizm anadarabicnewsarticledataset |