Cargando…

Annotated dataset of history-related tweets

In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the cate...

Descripción completa

Detalles Bibliográficos
Autores principales: Sumikawa, Yasunobu, Jatowt, Adam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427230/
https://www.ncbi.nlm.nih.gov/pubmed/34522734
http://dx.doi.org/10.1016/j.dib.2021.107344
_version_ 1783750151036207104
author Sumikawa, Yasunobu
Jatowt, Adam
author_facet Sumikawa, Yasunobu
Jatowt, Adam
author_sort Sumikawa, Yasunobu
collection PubMed
description In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories.
format Online
Article
Text
id pubmed-8427230
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-84272302021-09-13 Annotated dataset of history-related tweets Sumikawa, Yasunobu Jatowt, Adam Data Brief Data Article In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories. Elsevier 2021-09-04 /pmc/articles/PMC8427230/ /pubmed/34522734 http://dx.doi.org/10.1016/j.dib.2021.107344 Text en © 2021 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Data Article
Sumikawa, Yasunobu
Jatowt, Adam
Annotated dataset of history-related tweets
title Annotated dataset of history-related tweets
title_full Annotated dataset of history-related tweets
title_fullStr Annotated dataset of history-related tweets
title_full_unstemmed Annotated dataset of history-related tweets
title_short Annotated dataset of history-related tweets
title_sort annotated dataset of history-related tweets
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427230/
https://www.ncbi.nlm.nih.gov/pubmed/34522734
http://dx.doi.org/10.1016/j.dib.2021.107344
work_keys_str_mv AT sumikawayasunobu annotateddatasetofhistoryrelatedtweets
AT jatowtadam annotateddatasetofhistoryrelatedtweets