Cargando…

How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models

Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several fea...

Descripción completa

Detalles Bibliográficos
Autores principales: Malik, Muhammad Shahid Iqbal, Imran, Tahir, Mona Mamdouh, Jamjoom
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280574/
https://www.ncbi.nlm.nih.gov/pubmed/37346552
http://dx.doi.org/10.7717/peerj-cs.1248
_version_ 1785060825905496064
author Malik, Muhammad Shahid Iqbal
Imran, Tahir
Mona Mamdouh, Jamjoom
author_facet Malik, Muhammad Shahid Iqbal
Imran, Tahir
Mona Mamdouh, Jamjoom
author_sort Malik, Muhammad Shahid Iqbal
collection PubMed
description Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites.
format Online
Article
Text
id pubmed-10280574
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102805742023-06-21 How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models Malik, Muhammad Shahid Iqbal Imran, Tahir Mona Mamdouh, Jamjoom PeerJ Comput Sci Data Mining and Machine Learning Online propaganda is a mechanism to influence the opinions of social media users. It is a growing menace to public health, democratic institutions, and public society. The present study proposes a propaganda detection framework as a binary classification model based on a news repository. Several feature models are explored to develop a robust model such as part-of-speech, LIWC, word uni-gram, Embeddings from Language Models (ELMo), FastText, word2vec, latent semantic analysis (LSA), and char tri-gram feature models. Moreover, fine-tuning of the BERT is also performed. Three oversampling methods are investigated to handle the imbalance status of the Qprop dataset. SMOTE Edited Nearest Neighbors (ENN) presented the best results. The fine-tuning of BERT revealed that the BERT-320 sequence length is the best model. As a standalone model, the char tri-gram presented superior performance as compared to other features. The robust performance is observed against the combination of char tri-gram + BERT and char tri-gram + word2vec and they outperformed the two state-of-the-art baselines. In contrast to prior approaches, the addition of feature selection further improves the performance and achieved more than 97.60% recall, f1-score, and AUC on the dev and test part of the dataset. The findings of the present study can be used to organize news articles for various public news websites. PeerJ Inc. 2023-02-20 /pmc/articles/PMC10280574/ /pubmed/37346552 http://dx.doi.org/10.7717/peerj-cs.1248 Text en ©2023 Malik et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Malik, Muhammad Shahid Iqbal
Imran, Tahir
Mona Mamdouh, Jamjoom
How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title_full How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title_fullStr How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title_full_unstemmed How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title_short How to detect propaganda from social media? Exploitation of semantic and fine-tuned language models
title_sort how to detect propaganda from social media? exploitation of semantic and fine-tuned language models
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280574/
https://www.ncbi.nlm.nih.gov/pubmed/37346552
http://dx.doi.org/10.7717/peerj-cs.1248
work_keys_str_mv AT malikmuhammadshahidiqbal howtodetectpropagandafromsocialmediaexploitationofsemanticandfinetunedlanguagemodels
AT imrantahir howtodetectpropagandafromsocialmediaexploitationofsemanticandfinetunedlanguagemodels
AT monamamdouhjamjoom howtodetectpropagandafromsocialmediaexploitationofsemanticandfinetunedlanguagemodels