Cargando…
Annotated dataset of history-related tweets
In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the cate...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427230/ https://www.ncbi.nlm.nih.gov/pubmed/34522734 http://dx.doi.org/10.1016/j.dib.2021.107344 |
_version_ | 1783750151036207104 |
---|---|
author | Sumikawa, Yasunobu Jatowt, Adam |
author_facet | Sumikawa, Yasunobu Jatowt, Adam |
author_sort | Sumikawa, Yasunobu |
collection | PubMed |
description | In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories. |
format | Online Article Text |
id | pubmed-8427230 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-84272302021-09-13 Annotated dataset of history-related tweets Sumikawa, Yasunobu Jatowt, Adam Data Brief Data Article In this article, we present a dataset containing history-related content obtained from social media. It contains hashtags and tweets that include these hashtags, as well as the results of third party tools applied to the tweets that include extracted entities, years, and url categories, and the categories for the history-related hashtags we used to crawl the tweets. We collected the tweets from Twitter official API using hashtag-based crawling. The crawling process had been performed from March 2016 to July 2018. During the crawling, we applied a bootstrapping approach which is an iterative process of collecting tweets using a small set of seed hashtags, and a manual inspection of newly acquired hashtags that co-occur with the seed hashtags to include those they are related to history. Finally, we collected 147 history-related hashtags and 2,370,252 tweets. We then defined 6 categories for the collected hashtags after their manual investigation. The presented dataset could be useful for further analysis on how people refer to history in Twitter, for collecting new history-related tweets, for training classifiers to detect history-related tweets, or for further investigations of the proposed hashtag categories. Elsevier 2021-09-04 /pmc/articles/PMC8427230/ /pubmed/34522734 http://dx.doi.org/10.1016/j.dib.2021.107344 Text en © 2021 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Data Article Sumikawa, Yasunobu Jatowt, Adam Annotated dataset of history-related tweets |
title | Annotated dataset of history-related tweets |
title_full | Annotated dataset of history-related tweets |
title_fullStr | Annotated dataset of history-related tweets |
title_full_unstemmed | Annotated dataset of history-related tweets |
title_short | Annotated dataset of history-related tweets |
title_sort | annotated dataset of history-related tweets |
topic | Data Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8427230/ https://www.ncbi.nlm.nih.gov/pubmed/34522734 http://dx.doi.org/10.1016/j.dib.2021.107344 |
work_keys_str_mv | AT sumikawayasunobu annotateddatasetofhistoryrelatedtweets AT jatowtadam annotateddatasetofhistoryrelatedtweets |