Cargando…

The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming in...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Yoonsang, Nordgren, Rachel, Emery, Sherry
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7037495/
https://www.ncbi.nlm.nih.gov/pubmed/32019070
http://dx.doi.org/10.3390/ijerph17030864
_version_ 1783500441030492160
author Kim, Yoonsang
Nordgren, Rachel
Emery, Sherry
author_facet Kim, Yoonsang
Nordgren, Rachel
Emery, Sherry
author_sort Kim, Yoonsang
collection PubMed
description Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source.
format Online
Article
Text
id pubmed-7037495
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70374952020-03-11 The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure Kim, Yoonsang Nordgren, Rachel Emery, Sherry Int J Environ Res Public Health Article Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source. MDPI 2020-01-30 2020-02 /pmc/articles/PMC7037495/ /pubmed/32019070 http://dx.doi.org/10.3390/ijerph17030864 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Yoonsang
Nordgren, Rachel
Emery, Sherry
The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title_full The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title_fullStr The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title_full_unstemmed The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title_short The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure
title_sort story of goldilocks and three twitter’s apis: a pilot study on twitter data sources and disclosure
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7037495/
https://www.ncbi.nlm.nih.gov/pubmed/32019070
http://dx.doi.org/10.3390/ijerph17030864
work_keys_str_mv AT kimyoonsang thestoryofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure
AT nordgrenrachel thestoryofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure
AT emerysherry thestoryofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure
AT kimyoonsang storyofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure
AT nordgrenrachel storyofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure
AT emerysherry storyofgoldilocksandthreetwittersapisapilotstudyontwitterdatasourcesanddisclosure