Cargando…

Identifying Twitter users who repost unreliable news sources with linguistic information

Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handle...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mu, Yida, Aletras, Nikolaos
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2020
Materias:	Computational Linguistics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/ https://www.ncbi.nlm.nih.gov/pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325

_version_	1783659098394329088
author	Mu, Yida Aletras, Nikolaos
author_facet	Mu, Yida Aletras, Nikolaos
author_sort	Mu, Yida
collection	PubMed
description	Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories.
format	Online Article Text
id	pubmed-7924477
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-79244772021-04-02 Identifying Twitter users who repost unreliable news sources with linguistic information Mu, Yida Aletras, Nikolaos PeerJ Comput Sci Computational Linguistics Social media has become a popular source for online news consumption with millions of users worldwide. However, it has become a primary platform for spreading disinformation with severe societal implications. Automatically identifying social media users that are likely to propagate posts from handles of unreliable news sources sometime in the future is of utmost importance for early detection and prevention of disinformation diffusion in a network, and has yet to be explored. To that end, we present a novel task for predicting whether a user will repost content from Twitter handles of unreliable news sources by leveraging linguistic information from the user’s own posts. We develop a new dataset of approximately 6.2K Twitter users mapped into two categories: (1) those that have reposted content from unreliable news sources; and (2) those that repost content only from reliable sources. For our task, we evaluate a battery of supervised machine learning models as well as state-of-the-art neural models, achieving up to 79.7 macro F1. In addition, our linguistic feature analysis uncovers differences in language use and style between the two user categories. PeerJ Inc. 2020-12-14 /pmc/articles/PMC7924477/ /pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325 Text en ©2020 Mu and and Aletras https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Computational Linguistics Mu, Yida Aletras, Nikolaos Identifying Twitter users who repost unreliable news sources with linguistic information
title	Identifying Twitter users who repost unreliable news sources with linguistic information
title_full	Identifying Twitter users who repost unreliable news sources with linguistic information
title_fullStr	Identifying Twitter users who repost unreliable news sources with linguistic information
title_full_unstemmed	Identifying Twitter users who repost unreliable news sources with linguistic information
title_short	Identifying Twitter users who repost unreliable news sources with linguistic information
title_sort	identifying twitter users who repost unreliable news sources with linguistic information
topic	Computational Linguistics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924477/ https://www.ncbi.nlm.nih.gov/pubmed/33816975 http://dx.doi.org/10.7717/peerj-cs.325
work_keys_str_mv	AT muyida identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation AT aletrasnikolaos identifyingtwitteruserswhorepostunreliablenewssourceswithlinguisticinformation

Identifying Twitter users who repost unreliable news sources with linguistic information

Ejemplares similares