Cargando…

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach

OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach us...

Descripción completa

Detalles Bibliográficos
Autores principales: Wallace, Byron C, Noel-Storr, Anna, Marshall, Iain J, Cohen, Aaron M, Smalheiser, Neil R, Thomas, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5975623/
https://www.ncbi.nlm.nih.gov/pubmed/28541493
http://dx.doi.org/10.1093/jamia/ocx053
Descripción
Sumario:OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%–99% recall) with substantially less effort (we observed a reduction of around 60%–80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.