Cargando…

A network-based positive and unlabeled learning approach for fake news detection

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news...

Descripción completa

Detalles Bibliográficos
Autores principales: de Souza, Mariana Caravanti, Nogueira, Bruno Magalhães, Rossi, Rafael Geraldeli, Marcacini, Ricardo Marcondes, dos Santos, Brucce Neves, Rezende, Solange Oliveira
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/
https://www.ncbi.nlm.nih.gov/pubmed/34815619
http://dx.doi.org/10.1007/s10994-021-06111-6
_version_ 1784601334254665728
author de Souza, Mariana Caravanti
Nogueira, Bruno Magalhães
Rossi, Rafael Geraldeli
Marcacini, Ricardo Marcondes
dos Santos, Brucce Neves
Rezende, Solange Oliveira
author_facet de Souza, Mariana Caravanti
Nogueira, Bruno Magalhães
Rossi, Rafael Geraldeli
Marcacini, Ricardo Marcondes
dos Santos, Brucce Neves
Rezende, Solange Oliveira
author_sort de Souza, Mariana Caravanti
collection PubMed
description Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.
format Online
Article
Text
id pubmed-8601374
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-86013742021-11-19 A network-based positive and unlabeled learning approach for fake news detection de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira Mach Learn Article Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network. Springer US 2021-11-18 2022 /pmc/articles/PMC8601374/ /pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
de Souza, Mariana Caravanti
Nogueira, Bruno Magalhães
Rossi, Rafael Geraldeli
Marcacini, Ricardo Marcondes
dos Santos, Brucce Neves
Rezende, Solange Oliveira
A network-based positive and unlabeled learning approach for fake news detection
title A network-based positive and unlabeled learning approach for fake news detection
title_full A network-based positive and unlabeled learning approach for fake news detection
title_fullStr A network-based positive and unlabeled learning approach for fake news detection
title_full_unstemmed A network-based positive and unlabeled learning approach for fake news detection
title_short A network-based positive and unlabeled learning approach for fake news detection
title_sort network-based positive and unlabeled learning approach for fake news detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/
https://www.ncbi.nlm.nih.gov/pubmed/34815619
http://dx.doi.org/10.1007/s10994-021-06111-6
work_keys_str_mv AT desouzamarianacaravanti anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT nogueirabrunomagalhaes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT rossirafaelgeraldeli anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT marcaciniricardomarcondes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT dossantosbrucceneves anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT rezendesolangeoliveira anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT desouzamarianacaravanti networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT nogueirabrunomagalhaes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT rossirafaelgeraldeli networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT marcaciniricardomarcondes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT dossantosbrucceneves networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection
AT rezendesolangeoliveira networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection