Cargando…

Fake news detection in Urdu language using machine learning

With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness...

Descripción completa

Detalles Bibliográficos
Autores principales:	Farooq, Muhammad Shoaib, Naseem, Ansar, Rustam, Furqan, Ashraf, Imran
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Computational Linguistics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280395/ https://www.ncbi.nlm.nih.gov/pubmed/37346628 http://dx.doi.org/10.7717/peerj-cs.1353

_version_	1785060784338894848
author	Farooq, Muhammad Shoaib Naseem, Ansar Rustam, Furqan Ashraf, Imran
author_facet	Farooq, Muhammad Shoaib Naseem, Ansar Rustam, Furqan Ashraf, Imran
author_sort	Farooq, Muhammad Shoaib
collection	PubMed
description	With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively.
format	Online Article Text
id	pubmed-10280395
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-102803952023-06-21 Fake news detection in Urdu language using machine learning Farooq, Muhammad Shoaib Naseem, Ansar Rustam, Furqan Ashraf, Imran PeerJ Comput Sci Computational Linguistics With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively. PeerJ Inc. 2023-05-23 /pmc/articles/PMC10280395/ /pubmed/37346628 http://dx.doi.org/10.7717/peerj-cs.1353 Text en ©2023 Farooq et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Computational Linguistics Farooq, Muhammad Shoaib Naseem, Ansar Rustam, Furqan Ashraf, Imran Fake news detection in Urdu language using machine learning
title	Fake news detection in Urdu language using machine learning
title_full	Fake news detection in Urdu language using machine learning
title_fullStr	Fake news detection in Urdu language using machine learning
title_full_unstemmed	Fake news detection in Urdu language using machine learning
title_short	Fake news detection in Urdu language using machine learning
title_sort	fake news detection in urdu language using machine learning
topic	Computational Linguistics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280395/ https://www.ncbi.nlm.nih.gov/pubmed/37346628 http://dx.doi.org/10.7717/peerj-cs.1353
work_keys_str_mv	AT farooqmuhammadshoaib fakenewsdetectioninurdulanguageusingmachinelearning AT naseemansar fakenewsdetectioninurdulanguageusingmachinelearning AT rustamfurqan fakenewsdetectioninurdulanguageusingmachinelearning AT ashrafimran fakenewsdetectioninurdulanguageusingmachinelearning

Fake news detection in Urdu language using machine learning

Ejemplares similares