Cargando…

AFND: Arabic fake news dataset for the detection and classification of articles credibility

The news credibility detection task has started to gain more attention recently due to the rapid increase of news on different social media platforms. This article provides a large, labeled, and diverse Arabic Fake News Dataset (AFND) that is collected from public Arabic news websites. This dataset...

Descripción completa

Detalles Bibliográficos
Autores principales: Khalil, Ashwaq, Jarrah, Moath, Aldwairi, Monther, Jaradat, Manar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048144/
https://www.ncbi.nlm.nih.gov/pubmed/35496492
http://dx.doi.org/10.1016/j.dib.2022.108141
_version_ 1784695874632286208
author Khalil, Ashwaq
Jarrah, Moath
Aldwairi, Monther
Jaradat, Manar
author_facet Khalil, Ashwaq
Jarrah, Moath
Aldwairi, Monther
Jaradat, Manar
author_sort Khalil, Ashwaq
collection PubMed
description The news credibility detection task has started to gain more attention recently due to the rapid increase of news on different social media platforms. This article provides a large, labeled, and diverse Arabic Fake News Dataset (AFND) that is collected from public Arabic news websites. This dataset enables the research community to use supervised and unsupervised machine learning algorithms to classify the credibility of Arabic news articles. AFND consists of 606912 public news articles that were scraped from 134 public news websites of 19 different Arab countries over a 6-month period using Python scripts. The Arabic fact-check platform, Misbar, is used manually to classify each public news source into credible, not credible, or undecided. Weak supervision is applied to label news articles with the same label as the public source. AFND is imbalanced in the number of articles in each class. Hence, it is useful for researchers who focus on finding solutions for imbalanced datasets. The dataset is available in JSON format and can be accessed from Mendeley Data repository.
format Online
Article
Text
id pubmed-9048144
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-90481442022-04-29 AFND: Arabic fake news dataset for the detection and classification of articles credibility Khalil, Ashwaq Jarrah, Moath Aldwairi, Monther Jaradat, Manar Data Brief Data Article The news credibility detection task has started to gain more attention recently due to the rapid increase of news on different social media platforms. This article provides a large, labeled, and diverse Arabic Fake News Dataset (AFND) that is collected from public Arabic news websites. This dataset enables the research community to use supervised and unsupervised machine learning algorithms to classify the credibility of Arabic news articles. AFND consists of 606912 public news articles that were scraped from 134 public news websites of 19 different Arab countries over a 6-month period using Python scripts. The Arabic fact-check platform, Misbar, is used manually to classify each public news source into credible, not credible, or undecided. Weak supervision is applied to label news articles with the same label as the public source. AFND is imbalanced in the number of articles in each class. Hence, it is useful for researchers who focus on finding solutions for imbalanced datasets. The dataset is available in JSON format and can be accessed from Mendeley Data repository. Elsevier 2022-04-08 /pmc/articles/PMC9048144/ /pubmed/35496492 http://dx.doi.org/10.1016/j.dib.2022.108141 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Khalil, Ashwaq
Jarrah, Moath
Aldwairi, Monther
Jaradat, Manar
AFND: Arabic fake news dataset for the detection and classification of articles credibility
title AFND: Arabic fake news dataset for the detection and classification of articles credibility
title_full AFND: Arabic fake news dataset for the detection and classification of articles credibility
title_fullStr AFND: Arabic fake news dataset for the detection and classification of articles credibility
title_full_unstemmed AFND: Arabic fake news dataset for the detection and classification of articles credibility
title_short AFND: Arabic fake news dataset for the detection and classification of articles credibility
title_sort afnd: arabic fake news dataset for the detection and classification of articles credibility
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9048144/
https://www.ncbi.nlm.nih.gov/pubmed/35496492
http://dx.doi.org/10.1016/j.dib.2022.108141
work_keys_str_mv AT khalilashwaq afndarabicfakenewsdatasetforthedetectionandclassificationofarticlescredibility
AT jarrahmoath afndarabicfakenewsdatasetforthedetectionandclassificationofarticlescredibility
AT aldwairimonther afndarabicfakenewsdatasetforthedetectionandclassificationofarticlescredibility
AT jaradatmanar afndarabicfakenewsdatasetforthedetectionandclassificationofarticlescredibility