Cargando…

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we...

Descripción completa

Detalles Bibliográficos
Autores principales: Alshaabi, Thayer, Adams, Jane L., Arnold, Michael V., Minot, Joshua R., Dewhurst, David R., Reagan, Andrew J., Danforth, Christopher M., Dodds, Peter Sheridan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284897/
https://www.ncbi.nlm.nih.gov/pubmed/34272243
http://dx.doi.org/10.1126/sciadv.abe6534
_version_ 1783723475499745280
author Alshaabi, Thayer
Adams, Jane L.
Arnold, Michael V.
Minot, Joshua R.
Dewhurst, David R.
Reagan, Andrew J.
Danforth, Christopher M.
Dodds, Peter Sheridan
author_facet Alshaabi, Thayer
Adams, Jane L.
Arnold, Michael V.
Minot, Joshua R.
Dewhurst, David R.
Reagan, Andrew J.
Danforth, Christopher M.
Dodds, Peter Sheridan
author_sort Alshaabi, Thayer
collection PubMed
description In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument’s potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest.
format Online
Article
Text
id pubmed-8284897
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-82848972021-08-02 Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter Alshaabi, Thayer Adams, Jane L. Arnold, Michael V. Minot, Joshua R. Dewhurst, David R. Reagan, Andrew J. Danforth, Christopher M. Dodds, Peter Sheridan Sci Adv Research Resource In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument’s potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest. American Association for the Advancement of Science 2021-07-16 /pmc/articles/PMC8284897/ /pubmed/34272243 http://dx.doi.org/10.1126/sciadv.abe6534 Text en Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY). https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Resource
Alshaabi, Thayer
Adams, Jane L.
Arnold, Michael V.
Minot, Joshua R.
Dewhurst, David R.
Reagan, Andrew J.
Danforth, Christopher M.
Dodds, Peter Sheridan
Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title_full Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title_fullStr Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title_full_unstemmed Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title_short Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
title_sort storywrangler: a massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using twitter
topic Research Resource
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284897/
https://www.ncbi.nlm.nih.gov/pubmed/34272243
http://dx.doi.org/10.1126/sciadv.abe6534
work_keys_str_mv AT alshaabithayer storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT adamsjanel storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT arnoldmichaelv storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT minotjoshuar storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT dewhurstdavidr storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT reaganandrewj storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT danforthchristopherm storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter
AT doddspetersheridan storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter