Cargando…
A network-based positive and unlabeled learning approach for fake news detection
Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/ https://www.ncbi.nlm.nih.gov/pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6 |
_version_ | 1784601334254665728 |
---|---|
author | de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira |
author_facet | de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira |
author_sort | de Souza, Mariana Caravanti |
collection | PubMed |
description | Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network. |
format | Online Article Text |
id | pubmed-8601374 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-86013742021-11-19 A network-based positive and unlabeled learning approach for fake news detection de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira Mach Learn Article Fake news can rapidly spread through internet users and can deceive a large audience. Due to those characteristics, they can have a direct impact on political and economic events. Machine Learning approaches have been used to assist fake news identification. However, since the spectrum of real news is broad, hard to characterize, and expensive to label data due to the high update frequency, One-Class Learning (OCL) and Positive and Unlabeled Learning (PUL) emerge as an interesting approach for content-based fake news detection using a smaller set of labeled data than traditional machine learning techniques. In particular, network-based approaches are adequate for fake news detection since they allow incorporating information from different aspects of a publication to the problem modeling. In this paper, we propose a network-based approach based on Positive and Unlabeled Learning by Label Propagation (PU-LP), a one-class and transductive semi-supervised learning algorithm that performs classification by first identifying potential interest and non-interest documents into unlabeled data and then propagating labels to classify the remaining unlabeled documents. A label propagation approach is then employed to classify the remaining unlabeled documents. We assessed the performance of our proposal considering homogeneous (only documents) and heterogeneous (documents and terms) networks. Our comparative analysis considered four OCL algorithms extensively employed in One-Class text classification (k-Means, k-Nearest Neighbors Density-based, One-Class Support Vector Machine, and Dense Autoencoder), and another traditional PUL algorithm (Rocchio Support Vector Machine). The algorithms were evaluated in three news collections, considering balanced and extremely unbalanced scenarios. We used Bag-of-Words and Doc2Vec models to transform news into structured data. Results indicated that PU-LP approaches are more stable and achieve better results than other PUL and OCL approaches in most scenarios, performing similarly to semi-supervised binary algorithms. Also, the inclusion of terms in the news network activate better results, especially when news are distributed in the feature space considering veracity and subject. News representation using the Doc2Vec achieved better results than the Bag-of-Words model for both algorithms based on vector-space model and document similarity network. Springer US 2021-11-18 2022 /pmc/articles/PMC8601374/ /pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6 Text en © The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article de Souza, Mariana Caravanti Nogueira, Bruno Magalhães Rossi, Rafael Geraldeli Marcacini, Ricardo Marcondes dos Santos, Brucce Neves Rezende, Solange Oliveira A network-based positive and unlabeled learning approach for fake news detection |
title | A network-based positive and unlabeled learning approach for fake news detection |
title_full | A network-based positive and unlabeled learning approach for fake news detection |
title_fullStr | A network-based positive and unlabeled learning approach for fake news detection |
title_full_unstemmed | A network-based positive and unlabeled learning approach for fake news detection |
title_short | A network-based positive and unlabeled learning approach for fake news detection |
title_sort | network-based positive and unlabeled learning approach for fake news detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8601374/ https://www.ncbi.nlm.nih.gov/pubmed/34815619 http://dx.doi.org/10.1007/s10994-021-06111-6 |
work_keys_str_mv | AT desouzamarianacaravanti anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT nogueirabrunomagalhaes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rossirafaelgeraldeli anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT marcaciniricardomarcondes anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT dossantosbrucceneves anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rezendesolangeoliveira anetworkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT desouzamarianacaravanti networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT nogueirabrunomagalhaes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rossirafaelgeraldeli networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT marcaciniricardomarcondes networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT dossantosbrucceneves networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection AT rezendesolangeoliveira networkbasedpositiveandunlabeledlearningapproachforfakenewsdetection |