Cargando…

Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection

BACKGROUND: Crowdsourcing has become a valuable method for collecting medical research data. This approach, recruiting through open calls on the Web, is particularly useful for assembling large normative datasets. However, it is not known how natural language datasets collected over the Web differ f...

Descripción completa

Detalles Bibliográficos
Autores principales: Saunders, Daniel R, Bex, Peter J, Woods, Russell L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications Inc. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668615/
https://www.ncbi.nlm.nih.gov/pubmed/23689038
http://dx.doi.org/10.2196/jmir.2620
_version_ 1782271644635496448
author Saunders, Daniel R
Bex, Peter J
Woods, Russell L
author_facet Saunders, Daniel R
Bex, Peter J
Woods, Russell L
author_sort Saunders, Daniel R
collection PubMed
description BACKGROUND: Crowdsourcing has become a valuable method for collecting medical research data. This approach, recruiting through open calls on the Web, is particularly useful for assembling large normative datasets. However, it is not known how natural language datasets collected over the Web differ from those collected under controlled laboratory conditions. OBJECTIVE: To compare the natural language responses obtained from a crowdsourced sample of participants with responses collected in a conventional laboratory setting from participants recruited according to specific age and gender criteria. METHODS: We collected natural language descriptions of 200 half-minute movie clips, from Amazon Mechanical Turk workers (crowdsourced) and 60 participants recruited from the community (lab-sourced). Crowdsourced participants responded to as many clips as they wanted and typed their responses, whereas lab-sourced participants gave spoken responses to 40 clips, and their responses were transcribed. The content of the responses was evaluated using a take-one-out procedure, which compared responses to other responses to the same clip and to other clips, with a comparison of the average number of shared words. RESULTS: In contrast to the 13 months of recruiting that was required to collect normative data from 60 lab-sourced participants (with specific demographic characteristics), only 34 days were needed to collect normative data from 99 crowdsourced participants (contributing a median of 22 responses). The majority of crowdsourced workers were female, and the median age was 35 years, lower than the lab-sourced median of 62 years but similar to the median age of the US population. The responses contributed by the crowdsourced participants were longer on average, that is, 33 words compared to 28 words (P<.001), and they used a less varied vocabulary. However, there was strong similarity in the words used to describe a particular clip between the two datasets, as a cross-dataset count of shared words showed (P<.001). Within both datasets, responses contained substantial relevant content, with more words in common with responses to the same clip than to other clips (P<.001). There was evidence that responses from female and older crowdsourced participants had more shared words (P=.004 and .01 respectively), whereas younger participants had higher numbers of shared words in the lab-sourced population (P=.01). CONCLUSIONS: Crowdsourcing is an effective approach to quickly and economically collect a large reliable dataset of normative natural language responses.
format Online
Article
Text
id pubmed-3668615
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher JMIR Publications Inc.
record_format MEDLINE/PubMed
spelling pubmed-36686152013-06-03 Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection Saunders, Daniel R Bex, Peter J Woods, Russell L J Med Internet Res Original Paper BACKGROUND: Crowdsourcing has become a valuable method for collecting medical research data. This approach, recruiting through open calls on the Web, is particularly useful for assembling large normative datasets. However, it is not known how natural language datasets collected over the Web differ from those collected under controlled laboratory conditions. OBJECTIVE: To compare the natural language responses obtained from a crowdsourced sample of participants with responses collected in a conventional laboratory setting from participants recruited according to specific age and gender criteria. METHODS: We collected natural language descriptions of 200 half-minute movie clips, from Amazon Mechanical Turk workers (crowdsourced) and 60 participants recruited from the community (lab-sourced). Crowdsourced participants responded to as many clips as they wanted and typed their responses, whereas lab-sourced participants gave spoken responses to 40 clips, and their responses were transcribed. The content of the responses was evaluated using a take-one-out procedure, which compared responses to other responses to the same clip and to other clips, with a comparison of the average number of shared words. RESULTS: In contrast to the 13 months of recruiting that was required to collect normative data from 60 lab-sourced participants (with specific demographic characteristics), only 34 days were needed to collect normative data from 99 crowdsourced participants (contributing a median of 22 responses). The majority of crowdsourced workers were female, and the median age was 35 years, lower than the lab-sourced median of 62 years but similar to the median age of the US population. The responses contributed by the crowdsourced participants were longer on average, that is, 33 words compared to 28 words (P<.001), and they used a less varied vocabulary. However, there was strong similarity in the words used to describe a particular clip between the two datasets, as a cross-dataset count of shared words showed (P<.001). Within both datasets, responses contained substantial relevant content, with more words in common with responses to the same clip than to other clips (P<.001). There was evidence that responses from female and older crowdsourced participants had more shared words (P=.004 and .01 respectively), whereas younger participants had higher numbers of shared words in the lab-sourced population (P=.01). CONCLUSIONS: Crowdsourcing is an effective approach to quickly and economically collect a large reliable dataset of normative natural language responses. JMIR Publications Inc. 2013-05-20 /pmc/articles/PMC3668615/ /pubmed/23689038 http://dx.doi.org/10.2196/jmir.2620 Text en ©Daniel R Saunders, Peter J Bex, Russell L Woods. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 20.05.2013. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Saunders, Daniel R
Bex, Peter J
Woods, Russell L
Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title_full Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title_fullStr Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title_full_unstemmed Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title_short Crowdsourcing a Normative Natural Language Dataset: A Comparison of Amazon Mechanical Turk and In-Lab Data Collection
title_sort crowdsourcing a normative natural language dataset: a comparison of amazon mechanical turk and in-lab data collection
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668615/
https://www.ncbi.nlm.nih.gov/pubmed/23689038
http://dx.doi.org/10.2196/jmir.2620
work_keys_str_mv AT saundersdanielr crowdsourcinganormativenaturallanguagedatasetacomparisonofamazonmechanicalturkandinlabdatacollection
AT bexpeterj crowdsourcinganormativenaturallanguagedatasetacomparisonofamazonmechanicalturkandinlabdatacollection
AT woodsrusselll crowdsourcinganormativenaturallanguagedatasetacomparisonofamazonmechanicalturkandinlabdatacollection