Cargando…

Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter

As survey costs continue to rise and response rates decline, researchers are seeking more cost-effective ways to collect, analyze and process social and public opinion data. These issues have created an opportunity and interest in expanding the fit-for-purpose paradigm to include alternate sources s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Buskirk, Trent D., Blakely, Brian P., Eck, Adam, McGrath, Richard, Singh, Ravinder, Yu, Youzhi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2022
Materias:	Regular Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8857877/ https://www.ncbi.nlm.nih.gov/pubmed/35223365 http://dx.doi.org/10.1140/epjds/s13688-022-00321-1

_version_	1784654132117766144
author	Buskirk, Trent D. Blakely, Brian P. Eck, Adam McGrath, Richard Singh, Ravinder Yu, Youzhi
author_facet	Buskirk, Trent D. Blakely, Brian P. Eck, Adam McGrath, Richard Singh, Ravinder Yu, Youzhi
author_sort	Buskirk, Trent D.
collection	PubMed
description	As survey costs continue to rise and response rates decline, researchers are seeking more cost-effective ways to collect, analyze and process social and public opinion data. These issues have created an opportunity and interest in expanding the fit-for-purpose paradigm to include alternate sources such as passively collected sensor data and social media data. However, methods for accessing, sourcing and sampling social media data are just now being developed. In fact, there has been a small but growing body of literature focusing on comparing different Twitter data access methods through either the elaborate firehose or the free Twitter search or streaming APIs. Missing from the literature is a good understanding of how to randomly sample Tweets to produce datasets that are representative of the daily discourse, especially within geographical regions of interest, without requiring a census of all Tweets. This understanding is necessary for producing quality estimates of public opinion from social media sources such as Twitter. To address this gap, we propose and test the Velocity-Based Estimation for Sampling Tweets (VBEST) algorithm for selecting a probability based sample of tweets. We compare the performance of VBEST sample estimates to other methods of accessing Twitter through the Search API on the distribution of total Tweets as well as COVID-19 keyword incidence and frequency and find that the VBEST samples produce consistent and relatively low levels of overall bias compared to common methods of access through the Search API across many experimental conditions.
format	Online Article Text
id	pubmed-8857877
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-88578772022-02-22 Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter Buskirk, Trent D. Blakely, Brian P. Eck, Adam McGrath, Richard Singh, Ravinder Yu, Youzhi EPJ Data Sci Regular Article As survey costs continue to rise and response rates decline, researchers are seeking more cost-effective ways to collect, analyze and process social and public opinion data. These issues have created an opportunity and interest in expanding the fit-for-purpose paradigm to include alternate sources such as passively collected sensor data and social media data. However, methods for accessing, sourcing and sampling social media data are just now being developed. In fact, there has been a small but growing body of literature focusing on comparing different Twitter data access methods through either the elaborate firehose or the free Twitter search or streaming APIs. Missing from the literature is a good understanding of how to randomly sample Tweets to produce datasets that are representative of the daily discourse, especially within geographical regions of interest, without requiring a census of all Tweets. This understanding is necessary for producing quality estimates of public opinion from social media sources such as Twitter. To address this gap, we propose and test the Velocity-Based Estimation for Sampling Tweets (VBEST) algorithm for selecting a probability based sample of tweets. We compare the performance of VBEST sample estimates to other methods of accessing Twitter through the Search API on the distribution of total Tweets as well as COVID-19 keyword incidence and frequency and find that the VBEST samples produce consistent and relatively low levels of overall bias compared to common methods of access through the Search API across many experimental conditions. Springer Berlin Heidelberg 2022-02-19 2022 /pmc/articles/PMC8857877/ /pubmed/35223365 http://dx.doi.org/10.1140/epjds/s13688-022-00321-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Regular Article Buskirk, Trent D. Blakely, Brian P. Eck, Adam McGrath, Richard Singh, Ravinder Yu, Youzhi Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title	Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title_full	Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title_fullStr	Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title_full_unstemmed	Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title_short	Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter
title_sort	sweet tweets! evaluating a new approach for probability-based sampling of twitter
topic	Regular Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8857877/ https://www.ncbi.nlm.nih.gov/pubmed/35223365 http://dx.doi.org/10.1140/epjds/s13688-022-00321-1
work_keys_str_mv	AT buskirktrentd sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter AT blakelybrianp sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter AT eckadam sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter AT mcgrathrichard sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter AT singhravinder sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter AT yuyouzhi sweettweetsevaluatinganewapproachforprobabilitybasedsamplingoftwitter

Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter

Ejemplares similares