Cargando…
Identifying Twitter users who repost unreliable news sources with linguistic information
Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handle...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/ https://www.ncbi.nlm.nih.gov/pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325 |
_version_ | 1783659098394329088 |
---|---|
author | Mu, Yida Aletras, Nikolaos |
author_facet | Mu, Yida Aletras, Nikolaos |
author_sort | Mu, Yida |
collection | PubMed |
description | Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories. |
format | Online Article Text |
id | pubmed-7924477 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79244772021-04-02 Identifying Twitter users who repost unreliable news sources with linguistic information Mu, Yida Aletras, Nikolaos PeerJ Comput Sci Computational Linguistics Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories. PeerJ Inc. 2020-12-14 /pmc/articles/PMC7924477/ /pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325 Text en ©2020 Mu and and Aletras https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Computational Linguistics Mu, Yida Aletras, Nikolaos Identifying Twitter users who repost unreliable news sources with linguistic information |
title | Identifying Twitter users who repost unreliable news sources with linguistic information |
title_full | Identifying Twitter users who repost unreliable news sources with linguistic information |
title_fullStr | Identifying Twitter users who repost unreliable news sources with linguistic information |
title_full_unstemmed | Identifying Twitter users who repost unreliable news sources with linguistic information |
title_short | Identifying Twitter users who repost unreliable news sources with linguistic information |
title_sort | identifying twitter users who repost unreliable news sources with linguistic information |
topic | Computational Linguistics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/ https://www.ncbi.nlm.nih.gov/pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325 |
work_keys_str_mv | AT muyida identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation AT aletrasnikolaos identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation |