Cargando…

Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk

BACKGROUND: Crowdsourcing services, such as Amazon Mechanical Turk (AMT), allow researchers to use the collective intelligence of a wide range of web users for labor-intensive tasks. As the manual verification of the quality of the collected results is difficult because of the large volume of data a...

Descripción completa

Detalles Bibliográficos
Autores principales: Shakeri Hossein Abad, Zahra, Butler, Gregory P, Thompson, Wendy, Lee, Joon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808350/
https://www.ncbi.nlm.nih.gov/pubmed/35040794
http://dx.doi.org/10.2196/28749
_version_ 1784643869377298432
author Shakeri Hossein Abad, Zahra
Butler, Gregory P
Thompson, Wendy
Lee, Joon
author_facet Shakeri Hossein Abad, Zahra
Butler, Gregory P
Thompson, Wendy
Lee, Joon
author_sort Shakeri Hossein Abad, Zahra
collection PubMed
description BACKGROUND: Crowdsourcing services, such as Amazon Mechanical Turk (AMT), allow researchers to use the collective intelligence of a wide range of web users for labor-intensive tasks. As the manual verification of the quality of the collected results is difficult because of the large volume of data and the quick turnaround time of the process, many questions remain to be explored regarding the reliability of these resources for developing digital public health systems. OBJECTIVE: This study aims to explore and evaluate the application of crowdsourcing, generally, and AMT, specifically, for developing digital public health surveillance systems. METHODS: We collected 296,166 crowd-generated labels for 98,722 tweets, labeled by 610 AMT workers, to develop machine learning (ML) models for detecting behaviors related to physical activity, sedentary behavior, and sleep quality among Twitter users. To infer the ground truth labels and explore the quality of these labels, we studied 4 statistical consensus methods that are agnostic of task features and only focus on worker labeling behavior. Moreover, to model the meta-information associated with each labeling task and leverage the potential of context-sensitive data in the truth inference process, we developed 7 ML models, including traditional classifiers (offline and active), a deep learning–based classification model, and a hybrid convolutional neural network model. RESULTS: Although most crowdsourcing-based studies in public health have often equated majority vote with quality, the results of our study using a truth set of 9000 manually labeled tweets showed that consensus-based inference models mask underlying uncertainty in data and overlook the importance of task meta-information. Our evaluations across 3 physical activity, sedentary behavior, and sleep quality data sets showed that truth inference is a context-sensitive process, and none of the methods studied in this paper were consistently superior to others in predicting the truth label. We also found that the performance of the ML models trained on crowd-labeled data was sensitive to the quality of these labels, and poor-quality labels led to incorrect assessment of these models. Finally, we have provided a set of practical recommendations to improve the quality and reliability of crowdsourced data. CONCLUSIONS: Our findings indicate the importance of the quality of crowd-generated labels in developing ML models designed for decision-making purposes, such as public health surveillance decisions. A combination of inference models outlined and analyzed in this study could be used to quantitatively measure and improve the quality of crowd-generated labels for training ML models.
format Online
Article
Text
id pubmed-8808350
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-88083502022-02-04 Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk Shakeri Hossein Abad, Zahra Butler, Gregory P Thompson, Wendy Lee, Joon J Med Internet Res Original Paper BACKGROUND: Crowdsourcing services, such as Amazon Mechanical Turk (AMT), allow researchers to use the collective intelligence of a wide range of web users for labor-intensive tasks. As the manual verification of the quality of the collected results is difficult because of the large volume of data and the quick turnaround time of the process, many questions remain to be explored regarding the reliability of these resources for developing digital public health systems. OBJECTIVE: This study aims to explore and evaluate the application of crowdsourcing, generally, and AMT, specifically, for developing digital public health surveillance systems. METHODS: We collected 296,166 crowd-generated labels for 98,722 tweets, labeled by 610 AMT workers, to develop machine learning (ML) models for detecting behaviors related to physical activity, sedentary behavior, and sleep quality among Twitter users. To infer the ground truth labels and explore the quality of these labels, we studied 4 statistical consensus methods that are agnostic of task features and only focus on worker labeling behavior. Moreover, to model the meta-information associated with each labeling task and leverage the potential of context-sensitive data in the truth inference process, we developed 7 ML models, including traditional classifiers (offline and active), a deep learning–based classification model, and a hybrid convolutional neural network model. RESULTS: Although most crowdsourcing-based studies in public health have often equated majority vote with quality, the results of our study using a truth set of 9000 manually labeled tweets showed that consensus-based inference models mask underlying uncertainty in data and overlook the importance of task meta-information. Our evaluations across 3 physical activity, sedentary behavior, and sleep quality data sets showed that truth inference is a context-sensitive process, and none of the methods studied in this paper were consistently superior to others in predicting the truth label. We also found that the performance of the ML models trained on crowd-labeled data was sensitive to the quality of these labels, and poor-quality labels led to incorrect assessment of these models. Finally, we have provided a set of practical recommendations to improve the quality and reliability of crowdsourced data. CONCLUSIONS: Our findings indicate the importance of the quality of crowd-generated labels in developing ML models designed for decision-making purposes, such as public health surveillance decisions. A combination of inference models outlined and analyzed in this study could be used to quantitatively measure and improve the quality of crowd-generated labels for training ML models. JMIR Publications 2022-01-18 /pmc/articles/PMC8808350/ /pubmed/35040794 http://dx.doi.org/10.2196/28749 Text en ©Zahra Shakeri Hossein Abad, Gregory P Butler, Wendy Thompson, Joon Lee. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Shakeri Hossein Abad, Zahra
Butler, Gregory P
Thompson, Wendy
Lee, Joon
Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title_full Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title_fullStr Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title_full_unstemmed Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title_short Crowdsourcing for Machine Learning in Public Health Surveillance: Lessons Learned From Amazon Mechanical Turk
title_sort crowdsourcing for machine learning in public health surveillance: lessons learned from amazon mechanical turk
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808350/
https://www.ncbi.nlm.nih.gov/pubmed/35040794
http://dx.doi.org/10.2196/28749
work_keys_str_mv AT shakerihosseinabadzahra crowdsourcingformachinelearninginpublichealthsurveillancelessonslearnedfromamazonmechanicalturk
AT butlergregoryp crowdsourcingformachinelearninginpublichealthsurveillancelessonslearnedfromamazonmechanicalturk
AT thompsonwendy crowdsourcingformachinelearninginpublichealthsurveillancelessonslearnedfromamazonmechanicalturk
AT leejoon crowdsourcingformachinelearninginpublichealthsurveillancelessonslearnedfromamazonmechanicalturk