Cargando…
Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach us...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975623/ https://www.ncbi.nlm.nih.gov/pubmed/28541493 http://dx.doi.org/10.1093/jamia/ocx053 |
_version_ | 1783327026020614144 |
---|---|
author | Wallace, Byron C Noel-Storr, Anna Marshall, Iain J Cohen, Aaron M Smalheiser, Neil R Thomas, James |
author_facet | Wallace, Byron C Noel-Storr, Anna Marshall, Iain J Cohen, Aaron M Smalheiser, Neil R Thomas, James |
author_sort | Wallace, Byron C |
collection | PubMed |
description | OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. |
format | Online Article Text |
id | pubmed-5975623 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-59756232018-06-04 Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach Wallace, Byron C Noel-Storr, Anna Marshall, Iain J Cohen, Aaron M Smalheiser, Neil R Thomas, James J Am Med Inform Assoc Brief Communications OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. Oxford University Press 2017-11 2017-05-25 /pmc/articles/PMC5975623/ /pubmed/28541493 http://dx.doi.org/10.1093/jamia/ocx053 Text en © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Brief Communications Wallace, Byron C Noel-Storr, Anna Marshall, Iain J Cohen, Aaron M Smalheiser, Neil R Thomas, James Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title | Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title_full | Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title_fullStr | Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title_full_unstemmed | Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title_short | Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach |
title_sort | identifying reports of randomized controlled trials (rcts) via a hybrid machine learning and crowdsourcing approach |
topic | Brief Communications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975623/ https://www.ncbi.nlm.nih.gov/pubmed/28541493 http://dx.doi.org/10.1093/jamia/ocx053 |
work_keys_str_mv | AT wallacebyronc identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach AT noelstorranna identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach AT marshalliainj identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach AT cohenaaronm identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach AT smalheiserneilr identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach AT thomasjames identifyingreportsofrandomizedcontrolledtrialsrctsviaahybridmachinelearningandcrowdsourcingapproach |