Cargando…

Using of n-grams from morphological tags for fake news classification

Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams and POS tagging had been proven insuff...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kapusta, Jozef, Drlik, Martin, Munk, Michal
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Computational Linguistics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323729/ https://www.ncbi.nlm.nih.gov/pubmed/34395862 http://dx.doi.org/10.7717/peerj-cs.624

_version_	1783731300082909184
author	Kapusta, Jozef Drlik, Martin Munk, Michal
author_facet	Kapusta, Jozef Drlik, Martin Munk, Michal
author_sort	Kapusta, Jozef
collection	PubMed
description	Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved.
format	Online Article Text
id	pubmed-8323729
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-83237292021-08-13 Using of n-grams from morphological tags for fake news classification Kapusta, Jozef Drlik, Martin Munk, Michal PeerJ Comput Sci Computational Linguistics Research of the techniques for effective fake news detection has become very needed and attractive. These techniques have a background in many research disciplines, including morphological analysis. Several researchers stated that simple content-related n-grams and POS tagging had been proven insufficient for fake news classification. However, they did not realise any empirical research results, which could confirm these statements experimentally in the last decade. Considering this contradiction, the main aim of the paper is to experimentally evaluate the potential of the common use of n-grams and POS tags for the correct classification of fake and true news. The dataset of published fake or real news about the current Covid-19 pandemic was pre-processed using morphological analysis. As a result, n-grams of POS tags were prepared and further analysed. Three techniques based on POS tags were proposed and applied to different groups of n-grams in the pre-processing phase of fake news detection. The n-gram size was examined as the first. Subsequently, the most suitable depth of the decision trees for sufficient generalization was scoped. Finally, the performance measures of models based on the proposed techniques were compared with the standardised reference TF-IDF technique. The performance measures of the model like accuracy, precision, recall and f1-score are considered, together with the 10-fold cross-validation technique. Simultaneously, the question, whether the TF-IDF technique can be improved using POS tags was researched in detail. The results showed that the newly proposed techniques are comparable with the traditional TF-IDF technique. At the same time, it can be stated that the morphological analysis can improve the baseline TF-IDF technique. As a result, the performance measures of the model, precision for fake news and recall for real news, were statistically significantly improved. PeerJ Inc. 2021-07-19 /pmc/articles/PMC8323729/ /pubmed/34395862 http://dx.doi.org/10.7717/peerj-cs.624 Text en © 2021 Kapusta et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Computational Linguistics Kapusta, Jozef Drlik, Martin Munk, Michal Using of n-grams from morphological tags for fake news classification
title	Using of n-grams from morphological tags for fake news classification
title_full	Using of n-grams from morphological tags for fake news classification
title_fullStr	Using of n-grams from morphological tags for fake news classification
title_full_unstemmed	Using of n-grams from morphological tags for fake news classification
title_short	Using of n-grams from morphological tags for fake news classification
title_sort	using of n-grams from morphological tags for fake news classification
topic	Computational Linguistics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8323729/ https://www.ncbi.nlm.nih.gov/pubmed/34395862 http://dx.doi.org/10.7717/peerj-cs.624
work_keys_str_mv	AT kapustajozef usingofngramsfrommorphologicaltagsforfakenewsclassification AT drlikmartin usingofngramsfrommorphologicaltagsforfakenewsclassification AT munkmichal usingofngramsfrommorphologicaltagsforfakenewsclassification

Using of n-grams from morphological tags for fake news classification

Ejemplares similares