Cargando…

Dataset of discourses about COVID-19 and financial markets from Twitter

In this data article, a collection of 11,625,887 tweets on the topic of the COVID-19 pandemic are provided. The data from Twitter were collected through Twitter API from January 2020 to June 2020. In addition, we also provided subsets of tweets containing discourses on both COVID-19 and financial to...

Descripción completa

Detalles Bibliográficos
Autor principal: Ngo, Vu Minh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9270199/
https://www.ncbi.nlm.nih.gov/pubmed/35818354
http://dx.doi.org/10.1016/j.dib.2022.108428
_version_ 1784744408719032320
author Ngo, Vu Minh
author_facet Ngo, Vu Minh
author_sort Ngo, Vu Minh
collection PubMed
description In this data article, a collection of 11,625,887 tweets on the topic of the COVID-19 pandemic are provided. The data from Twitter were collected through Twitter API from January 2020 to June 2020. In addition, we also provided subsets of tweets containing discourses on both COVID-19 and financial topics. In order to facilitate the research on sentiment analysis, the Sentiment140 dataset containing 1,600,000 tweets that were annotated as positive or negative sentiment was also provided (Go et al., 2009) We used Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to transform documents to numeric vectors and used logistic regression classifier to train and predict sentiments of tweets. These datasets may garner interest from data science, economists, social science, natural language processing, epidemiology, and public health groups.
format Online
Article
Text
id pubmed-9270199
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-92701992022-07-10 Dataset of discourses about COVID-19 and financial markets from Twitter Ngo, Vu Minh Data Brief Data Article In this data article, a collection of 11,625,887 tweets on the topic of the COVID-19 pandemic are provided. The data from Twitter were collected through Twitter API from January 2020 to June 2020. In addition, we also provided subsets of tweets containing discourses on both COVID-19 and financial topics. In order to facilitate the research on sentiment analysis, the Sentiment140 dataset containing 1,600,000 tweets that were annotated as positive or negative sentiment was also provided (Go et al., 2009) We used Term Frequency-Inverse Document Frequency (TF-IDF) algorithm to transform documents to numeric vectors and used logistic regression classifier to train and predict sentiments of tweets. These datasets may garner interest from data science, economists, social science, natural language processing, epidemiology, and public health groups. Elsevier 2022-06-30 /pmc/articles/PMC9270199/ /pubmed/35818354 http://dx.doi.org/10.1016/j.dib.2022.108428 Text en © 2022 The Author(s). Published by Elsevier Inc. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Data Article
Ngo, Vu Minh
Dataset of discourses about COVID-19 and financial markets from Twitter
title Dataset of discourses about COVID-19 and financial markets from Twitter
title_full Dataset of discourses about COVID-19 and financial markets from Twitter
title_fullStr Dataset of discourses about COVID-19 and financial markets from Twitter
title_full_unstemmed Dataset of discourses about COVID-19 and financial markets from Twitter
title_short Dataset of discourses about COVID-19 and financial markets from Twitter
title_sort dataset of discourses about covid-19 and financial markets from twitter
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9270199/
https://www.ncbi.nlm.nih.gov/pubmed/35818354
http://dx.doi.org/10.1016/j.dib.2022.108428
work_keys_str_mv AT ngovuminh datasetofdiscoursesaboutcovid19andfinancialmarketsfromtwitter