Cargando…

A network-based positive and unlabeled learning approach for fake news detection

Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Souza, Mariana Caravanti, Nogueira, Bruno Magalhães, Rossi, Rafael Geraldeli, Marcacini, Ricardo Marcondes, dos Santos, Brucce Neves, Rezende, Solange Oliveira
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/ https://www.ncbi.nlm.nih.gov/pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6

_version_	1784601334254665728
author	de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira
author_facet	de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira
author_sort	de Souza, Mariana Caravanti
collection	PubMed
description	Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network.
format	Online Article Text
id	pubmed-8601374
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-86013742021-11-19 A network-based positive and unlabeled learning approach for fake news detection de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira Mach Learn Article Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network. Springer US 2021-11-18 2022 /pmc/articles/PMC8601374/ /pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira A network-based positive and unlabeled learning approach for fake news detection
title	A network-based positive and unlabeled learning approach for fake news detection
title_full	A network-based positive and unlabeled learning approach for fake news detection
title_fullStr	A network-based positive and unlabeled learning approach for fake news detection
title_full_unstemmed	A network-based positive and unlabeled learning approach for fake news detection
title_short	A network-based positive and unlabeled learning approach for fake news detection
title_sort	network-based positive and unlabeled learning approach for fake news detection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/ https://www.ncbi.nlm.nih.gov/pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6
work_keys_str_mv	AT desouzamarianacaravanti anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT nogueirabrunomagalhaes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rossirafaelgeraldeli anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT marcaciniricardomarcondes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT dossantosbrucceneves anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rezendesolangeoliveira anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT desouzamarianacaravanti networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT nogueirabrunomagalhaes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rossirafaelgeraldeli networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT marcaciniricardomarcondes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT dossantosbrucceneves networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rezendesolangeoliveira networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection

A network-based positive and unlabeled learning approach for fake news detection

Ejemplares similares