Cargando…

Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7

Crowdsourced psychological and other biobehavioral research using platforms like Amazon’s Mechanical Turk (MTurk) is increasingly common – but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common sc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Agley, Jon, Xiao, Yunyu, Nolan, Rachael, Golzarri-Arroyo, Lilian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344397/ https://www.ncbi.nlm.nih.gov/pubmed/34357539 http://dx.doi.org/10.3758/s13428-021-01665-8

_version_	1783734472009580544
author	Agley, Jon Xiao, Yunyu Nolan, Rachael Golzarri-Arroyo, Lilian
author_facet	Agley, Jon Xiao, Yunyu Nolan, Rachael Golzarri-Arroyo, Lilian
author_sort	Agley, Jon
collection	PubMed
description	Crowdsourced psychological and other biobehavioral research using platforms like Amazon’s Mechanical Turk (MTurk) is increasingly common – but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common screening tools would be significantly different among MTurk workers who were subject to different sets of quality control checks. We conducted a single-stage, randomized controlled trial with equal allocation to each of four study arms: Arm 1 (Control Arm), Arm 2 (Bot/VPN Check), Arm 3 (Truthfulness/Attention Check), and Arm 4 (Stringent Arm – All Checks). Data collection was completed in Qualtrics, to which participants were referred from MTurk. Subjects (n = 1100) were recruited on November 20–21, 2020. Eligible workers were required to claim U.S. residency, have a successful task completion rate > 95%, have completed a minimum of 100 tasks, and have completed a maximum of 10,000 tasks. Participants completed the US-Alcohol Use Disorders Identification Test (USAUDIT), the Patient Health Questionnaire (PHQ-9), and a screener for Generalized Anxiety Disorder (GAD-7). We found that differing quality control approaches significantly, meaningfully, and directionally affected outcome scores on each of the screening tools. Most notably, workers in Arm 1 (Control) reported higher scores than those in Arms 3 and 4 for all tools, and a higher score than workers in Arm 2 for the PHQ-9. These data suggest that the use, or lack thereof, of quality control questions in crowdsourced research may substantively affect findings, as might the types of quality control items. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.3758/s13428-021-01665-8.
format	Online Article Text
id	pubmed-8344397
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-83443972021-08-09 Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7 Agley, Jon Xiao, Yunyu Nolan, Rachael Golzarri-Arroyo, Lilian Behav Res Methods Article Crowdsourced psychological and other biobehavioral research using platforms like Amazon’s Mechanical Turk (MTurk) is increasingly common – but has proliferated more rapidly than studies to establish data quality best practices. Thus, this study investigated whether outcome scores for three common screening tools would be significantly different among MTurk workers who were subject to different sets of quality control checks. We conducted a single-stage, randomized controlled trial with equal allocation to each of four study arms: Arm 1 (Control Arm), Arm 2 (Bot/VPN Check), Arm 3 (Truthfulness/Attention Check), and Arm 4 (Stringent Arm – All Checks). Data collection was completed in Qualtrics, to which participants were referred from MTurk. Subjects (n = 1100) were recruited on November 20–21, 2020. Eligible workers were required to claim U.S. residency, have a successful task completion rate > 95%, have completed a minimum of 100 tasks, and have completed a maximum of 10,000 tasks. Participants completed the US-Alcohol Use Disorders Identification Test (USAUDIT), the Patient Health Questionnaire (PHQ-9), and a screener for Generalized Anxiety Disorder (GAD-7). We found that differing quality control approaches significantly, meaningfully, and directionally affected outcome scores on each of the screening tools. Most notably, workers in Arm 1 (Control) reported higher scores than those in Arms 3 and 4 for all tools, and a higher score than workers in Arm 2 for the PHQ-9. These data suggest that the use, or lack thereof, of quality control questions in crowdsourced research may substantively affect findings, as might the types of quality control items. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.3758/s13428-021-01665-8. Springer US 2021-08-06 2022 /pmc/articles/PMC8344397/ /pubmed/34357539 http://dx.doi.org/10.3758/s13428-021-01665-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Agley, Jon Xiao, Yunyu Nolan, Rachael Golzarri-Arroyo, Lilian Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title	Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title_full	Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title_fullStr	Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title_full_unstemmed	Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title_short	Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7
title_sort	quality control questions on amazon’s mechanical turk (mturk): a randomized trial of impact on the usaudit, phq-9, and gad-7
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8344397/ https://www.ncbi.nlm.nih.gov/pubmed/34357539 http://dx.doi.org/10.3758/s13428-021-01665-8
work_keys_str_mv	AT agleyjon qualitycontrolquestionsonamazonsmechanicalturkmturkarandomizedtrialofimpactontheusauditphq9andgad7 AT xiaoyunyu qualitycontrolquestionsonamazonsmechanicalturkmturkarandomizedtrialofimpactontheusauditphq9andgad7 AT nolanrachael qualitycontrolquestionsonamazonsmechanicalturkmturkarandomizedtrialofimpactontheusauditphq9andgad7 AT golzarriarroyolilian qualitycontrolquestionsonamazonsmechanicalturkmturkarandomizedtrialofimpactontheusauditphq9andgad7

Quality control questions on Amazon’s Mechanical Turk (MTurk): A randomized trial of impact on the USAUDIT, PHQ-9, and GAD-7

Ejemplares similares