Cargando…

Fake news detection: deep semantic representation with enhanced feature engineering

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Samadi, Mohammadreza, Momtazi, Saeedeh
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2023
Materias:	Regular Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/ https://www.ncbi.nlm.nih.gov/pubmed/37362632 http://dx.doi.org/10.1007/s41060-023-00387-8

_version_	1784903381842657280
author	Samadi, Mohammadreza Momtazi, Saeedeh
author_facet	Samadi, Mohammadreza Momtazi, Saeedeh
author_sort	Samadi, Mohammadreza
collection	PubMed
description	Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021).
format	Online Article Text
id	pubmed-9998010
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-99980102023-03-10 Fake news detection: deep semantic representation with enhanced feature engineering Samadi, Mohammadreza Momtazi, Saeedeh Int J Data Sci Anal Regular Paper Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027, 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620, 2021). Springer International Publishing 2023-03-09 /pmc/articles/PMC9998010/ /pubmed/37362632 http://dx.doi.org/10.1007/s41060-023-00387-8 Text en © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Regular Paper Samadi, Mohammadreza Momtazi, Saeedeh Fake news detection: deep semantic representation with enhanced feature engineering
title	Fake news detection: deep semantic representation with enhanced feature engineering
title_full	Fake news detection: deep semantic representation with enhanced feature engineering
title_fullStr	Fake news detection: deep semantic representation with enhanced feature engineering
title_full_unstemmed	Fake news detection: deep semantic representation with enhanced feature engineering
title_short	Fake news detection: deep semantic representation with enhanced feature engineering
title_sort	fake news detection: deep semantic representation with enhanced feature engineering
topic	Regular Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9998010/ https://www.ncbi.nlm.nih.gov/pubmed/37362632 http://dx.doi.org/10.1007/s41060-023-00387-8
work_keys_str_mv	AT samadimohammadreza fakenewsdetectiondeepsemanticrepresentationwithenhancedfeatureengineering AT momtazisaeedeh fakenewsdetectiondeepsemanticrepresentationwithenhancedfeatureengineering

Fake news detection: deep semantic representation with enhanced feature engineering

Ejemplares similares