Cargando…

Identifying Twitter users who repost unreliable news sources with linguistic information

Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handle...

Descripción completa

Detalles Bibliográficos
Autores principales: Mu, Yida, Aletras, Nikolaos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/
https://www.ncbi.nlm.nih.gov/pubmed/33816975
http://dx.doi.org/10.7717/peerj-cs.325
_version_ 1783659098394329088
author Mu, Yida
Aletras, Nikolaos
author_facet Mu, Yida
Aletras, Nikolaos
author_sort Mu, Yida
collection PubMed
description Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories.
format Online
Article
Text
id pubmed-7924477
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79244772021-04-02 Identifying Twitter users who repost unreliable news sources with linguistic information Mu, Yida Aletras, Nikolaos PeerJ Comput Sci Computational Linguistics Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories. PeerJ Inc. 2020-12-14 /pmc/articles/PMC7924477/ /pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325 Text en ©2020 Mu and and Aletras https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Computational Linguistics
Mu, Yida
Aletras, Nikolaos
Identifying Twitter users who repost unreliable news sources with linguistic information
title Identifying Twitter users who repost unreliable news sources with linguistic information
title_full Identifying Twitter users who repost unreliable news sources with linguistic information
title_fullStr Identifying Twitter users who repost unreliable news sources with linguistic information
title_full_unstemmed Identifying Twitter users who repost unreliable news sources with linguistic information
title_short Identifying Twitter users who repost unreliable news sources with linguistic information
title_sort identifying twitter users who repost unreliable news sources with linguistic information
topic Computational Linguistics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/
https://www.ncbi.nlm.nih.gov/pubmed/33816975
http://dx.doi.org/10.7717/peerj-cs.325
work_keys_str_mv AT muyida identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation
AT aletrasnikolaos identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation