Cargando…

ANAD: Arabic news article dataset

In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including s...

Descripción completa

Detalles Bibliográficos
Autores principales: Altamimi, Mohammed, Alayba, Abdulaziz M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415830/
https://www.ncbi.nlm.nih.gov/pubmed/37577410
http://dx.doi.org/10.1016/j.dib.2023.109460
_version_ 1785087633061314560
author Altamimi, Mohammed
Alayba, Abdulaziz M.
author_facet Altamimi, Mohammed
Alayba, Abdulaziz M.
author_sort Altamimi, Mohammed
collection PubMed
description In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main.
format Online
Article
Text
id pubmed-10415830
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-104158302023-08-12 ANAD: Arabic news article dataset Altamimi, Mohammed Alayba, Abdulaziz M. Data Brief Data Article In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download at https://github.com/alaybaa/ArabicArticlesDataset/tree/main. Elsevier 2023-07-29 /pmc/articles/PMC10415830/ /pubmed/37577410 http://dx.doi.org/10.1016/j.dib.2023.109460 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Altamimi, Mohammed
Alayba, Abdulaziz M.
ANAD: Arabic news article dataset
title ANAD: Arabic news article dataset
title_full ANAD: Arabic news article dataset
title_fullStr ANAD: Arabic news article dataset
title_full_unstemmed ANAD: Arabic news article dataset
title_short ANAD: Arabic news article dataset
title_sort anad: arabic news article dataset
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10415830/
https://www.ncbi.nlm.nih.gov/pubmed/37577410
http://dx.doi.org/10.1016/j.dib.2023.109460
work_keys_str_mv AT altamimimohammed anadarabicnewsarticledataset
AT alaybaabdulazizm anadarabicnewsarticledataset