Cargando…

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach

OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach us...

Descripción completa

Detalles Bibliográficos
Autores principales: Wallace, Byron C, Noel-Storr, Anna, Marshall, Iain J, Cohen, Aaron M, Smalheiser, Neil R, Thomas, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975623/
https://www.ncbi.nlm.nih.gov/pubmed/28541493
http://dx.doi.org/10.1093/jamia/ocx053
_version_ 1783327026020614144
author Wallace, Byron C
Noel-Storr, Anna
Marshall, Iain J
Cohen, Aaron M
Smalheiser, Neil R
Thomas, James
author_facet Wallace, Byron C
Noel-Storr, Anna
Marshall, Iain J
Cohen, Aaron M
Smalheiser, Neil R
Thomas, James
author_sort Wallace, Byron C
collection PubMed
description OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.
format Online
Article
Text
id pubmed-5975623
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59756232018-06-04 Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach Wallace, Byron C Noel-Storr, Anna Marshall, Iain J Cohen, Aaron M Smalheiser, Neil R Thomas, James J Am Med Inform Assoc Brief Communications OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. Oxford University Press 2017-11 2017-05-25 /pmc/articles/PMC5975623/ /pubmed/28541493 http://dx.doi.org/10.1093/jamia/ocx053 Text en © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Brief Communications
Wallace, Byron C
Noel-Storr, Anna
Marshall, Iain J
Cohen, Aaron M
Smalheiser, Neil R
Thomas, James
Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title_full Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title_fullStr Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title_full_unstemmed Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title_short Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
title_sort identifying reports of randomized controlled trials (rcts) via a hybrid machine learning and crowdsourcing approach
topic Brief Communications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975623/
https://www.ncbi.nlm.nih.gov/pubmed/28541493
http://dx.doi.org/10.1093/jamia/ocx053
work_keys_str_mv AT wallacebyronc identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach
AT noelstorranna identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach
AT marshalliainj identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach
AT cohenaaronm identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach
AT smalheiserneilr identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach
AT thomasjames identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach