Cargando…

BillionCOV: An enriched billion-scale collection of COVID-19 tweets for efficient hydration

The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. S...

Descripción completa

Detalles Bibliográficos
Autores principales: Lamsal, Rabindra, Read, Maria Rodriguez, Karunasekera, Shanika
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10175078/
https://www.ncbi.nlm.nih.gov/pubmed/37223279
http://dx.doi.org/10.1016/j.dib.2023.109229
Descripción
Sumario:The COVID-19 pandemic has introduced new norms, such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic’s seriousness has made people vocal on social media, especially on microblogs such as Twitter. Since the early days of the outbreak, researchers have been collecting and sharing large-scale datasets of COVID-19 tweets. However, the existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. To address these issues, this paper introduces an enriched global billion-scale English-language COVID-19 tweets dataset, BillionCOV, which contains 1.4 billion tweets originating from 240 countries and territories between October 2019 and April 2022. Importantly, BillionCOV facilitates researchers to filter tweet identifiers for efficient hydration. We anticipate that the dataset of this scale with global scope and extended temporal coverage will aid in obtaining a thorough understanding of the pandemic’s conversational dynamics.