Cargando…

An augmented multilingual Twitter dataset for studying the COVID-19 infodemic

This work presents an openly available dataset to facilitate researchers’ exploration and hypothesis testing about the social discourse of the COVID-19 pandemic. The dataset currently consists of over 2.2 billions tweets (count as of September, 2021), from all over the world, in multiple languages....

Descripción completa

Detalles Bibliográficos
Autores principales: Lopez, Christian E., Gallemore, Caleb
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Vienna 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528187/
https://www.ncbi.nlm.nih.gov/pubmed/34697560
http://dx.doi.org/10.1007/s13278-021-00825-0
_version_ 1784586207245631488
author Lopez, Christian E.
Gallemore, Caleb
author_facet Lopez, Christian E.
Gallemore, Caleb
author_sort Lopez, Christian E.
collection PubMed
description This work presents an openly available dataset to facilitate researchers’ exploration and hypothesis testing about the social discourse of the COVID-19 pandemic. The dataset currently consists of over 2.2 billions tweets (count as of September, 2021), from all over the world, in multiple languages. Tweets start from January 22, 2020, when the total cases of reported COVID-19 were below 600 worldwide. The dataset was collected using the Twitter API and by rehydrating tweets from other available datasets, data collection is ongoing as of the time of writing. To facilitate hypothesis testing and exploration of social discourse, the English and Spanish tweets have been augmented with state-of-the-art Twitter Sentiment and Named Entity Recognition algorithms. The dataset and the summary files provided allow researchers to avoid some computationally intensive analyses, facilitating more widespread use of social media data to gain insights on issues such as (mis)information diffusion, semantic networks, sentiments, and the evolution of COVID-19 discussions. In addition, the dataset provides an archive for researchers in the social sciences wishing to have access to a dataset covering the entire duration of the pandemic.
format Online
Article
Text
id pubmed-8528187
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Vienna
record_format MEDLINE/PubMed
spelling pubmed-85281872021-10-21 An augmented multilingual Twitter dataset for studying the COVID-19 infodemic Lopez, Christian E. Gallemore, Caleb Soc Netw Anal Min Review Paper This work presents an openly available dataset to facilitate researchers’ exploration and hypothesis testing about the social discourse of the COVID-19 pandemic. The dataset currently consists of over 2.2 billions tweets (count as of September, 2021), from all over the world, in multiple languages. Tweets start from January 22, 2020, when the total cases of reported COVID-19 were below 600 worldwide. The dataset was collected using the Twitter API and by rehydrating tweets from other available datasets, data collection is ongoing as of the time of writing. To facilitate hypothesis testing and exploration of social discourse, the English and Spanish tweets have been augmented with state-of-the-art Twitter Sentiment and Named Entity Recognition algorithms. The dataset and the summary files provided allow researchers to avoid some computationally intensive analyses, facilitating more widespread use of social media data to gain insights on issues such as (mis)information diffusion, semantic networks, sentiments, and the evolution of COVID-19 discussions. In addition, the dataset provides an archive for researchers in the social sciences wishing to have access to a dataset covering the entire duration of the pandemic. Springer Vienna 2021-10-20 2021 /pmc/articles/PMC8528187/ /pubmed/34697560 http://dx.doi.org/10.1007/s13278-021-00825-0 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Review Paper
Lopez, Christian E.
Gallemore, Caleb
An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title_full An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title_fullStr An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title_full_unstemmed An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title_short An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
title_sort augmented multilingual twitter dataset for studying the covid-19 infodemic
topic Review Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528187/
https://www.ncbi.nlm.nih.gov/pubmed/34697560
http://dx.doi.org/10.1007/s13278-021-00825-0
work_keys_str_mv AT lopezchristiane anaugmentedmultilingualtwitterdatasetforstudyingthecovid19infodemic
AT gallemorecaleb anaugmentedmultilingualtwitterdatasetforstudyingthecovid19infodemic
AT lopezchristiane augmentedmultilingualtwitterdatasetforstudyingthecovid19infodemic
AT gallemorecaleb augmentedmultilingualtwitterdatasetforstudyingthecovid19infodemic