Cargando…
Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Association for the Advancement of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284897/ https://www.ncbi.nlm.nih.gov/pubmed/34272243 http://dx.doi.org/10.1126/sciadv.abe6534 |
_version_ | 1783723475499745280 |
---|---|
author | Alshaabi, Thayer Adams, Jane L. Arnold, Michael V. Minot, Joshua R. Dewhurst, David R. Reagan, Andrew J. Danforth, Christopher M. Dodds, Peter Sheridan |
author_facet | Alshaabi, Thayer Adams, Jane L. Arnold, Michael V. Minot, Joshua R. Dewhurst, David R. Reagan, Andrew J. Danforth, Christopher M. Dodds, Peter Sheridan |
author_sort | Alshaabi, Thayer |
collection | PubMed |
description | In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument’s potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest. |
format | Online Article Text |
id | pubmed-8284897 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | American Association for the Advancement of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-82848972021-08-02 Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter Alshaabi, Thayer Adams, Jane L. Arnold, Michael V. Minot, Joshua R. Dewhurst, David R. Reagan, Andrew J. Danforth, Christopher M. Dodds, Peter Sheridan Sci Adv Research Resource In real time, Twitter strongly imprints world events, popular culture, and the day-to-day, recording an ever-growing compendium of language change. Vitally, and absent from many standard corpora such as books and news archives, Twitter also encodes popularity and spreading through retweets. Here, we describe Storywrangler, an ongoing curation of over 100 billion tweets containing 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into 1-, 2-, and 3-grams across 100+ languages, generating frequencies for words, hashtags, handles, numerals, symbols, and emojis. We make the dataset available through an interactive time series viewer and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of tracking dynamic changes in n-grams can be extended to any temporally evolving corpus. Illustrating the instrument’s potential, we present example use cases including social amplification, the sociotechnical dynamics of famous individuals, box office success, and social unrest. American Association for the Advancement of Science 2021-07-16 /pmc/articles/PMC8284897/ /pubmed/34272243 http://dx.doi.org/10.1126/sciadv.abe6534 Text en Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY). https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Resource Alshaabi, Thayer Adams, Jane L. Arnold, Michael V. Minot, Joshua R. Dewhurst, David R. Reagan, Andrew J. Danforth, Christopher M. Dodds, Peter Sheridan Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title | Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title_full | Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title_fullStr | Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title_full_unstemmed | Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title_short | Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter |
title_sort | storywrangler: a massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using twitter |
topic | Research Resource |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284897/ https://www.ncbi.nlm.nih.gov/pubmed/34272243 http://dx.doi.org/10.1126/sciadv.abe6534 |
work_keys_str_mv | AT alshaabithayer storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT adamsjanel storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT arnoldmichaelv storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT minotjoshuar storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT dewhurstdavidr storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT reaganandrewj storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT danforthchristopherm storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter AT doddspetersheridan storywrangleramassiveexploratoriumforsociolinguisticculturalsocioeconomicandpoliticaltimelinesusingtwitter |