Cargando…

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews

OBJECTIVES: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. METHODS: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochr...

Descripción completa

Detalles Bibliográficos
Autores principales: Thomas, James, McDonald, Steve, Noel-Storr, Anna, Shemilt, Ian, Elliott, Julian, Mavergames, Chris, Marshall, Iain J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8168828/
https://www.ncbi.nlm.nih.gov/pubmed/33171275
http://dx.doi.org/10.1016/j.jclinepi.2020.11.003
_version_ 1783701936828055552
author Thomas, James
McDonald, Steve
Noel-Storr, Anna
Shemilt, Ian
Elliott, Julian
Mavergames, Chris
Marshall, Iain J.
author_facet Thomas, James
McDonald, Steve
Noel-Storr, Anna
Shemilt, Ian
Elliott, Julian
Mavergames, Chris
Marshall, Iain J.
author_sort Thomas, James
collection PubMed
description OBJECTIVES: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. METHODS: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. RESULTS: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98–0.99) and precision of 0.08 (95% confidence interval 0.06–0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. CONCLUSIONS: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production.
format Online
Article
Text
id pubmed-8168828
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-81688282021-06-05 Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews Thomas, James McDonald, Steve Noel-Storr, Anna Shemilt, Ian Elliott, Julian Mavergames, Chris Marshall, Iain J. J Clin Epidemiol Original Article OBJECTIVES: This study developed, calibrated, and evaluated a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. METHODS: A machine learning classifier for retrieving randomized controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labeled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labeled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. RESULTS: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs), and our bootstrap validation found the classifier had recall of 0.99 (95% confidence interval 0.98–0.99) and precision of 0.08 (95% confidence interval 0.06–0.12) in this data set. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. CONCLUSIONS: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane Reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production. Elsevier 2021-05 /pmc/articles/PMC8168828/ /pubmed/33171275 http://dx.doi.org/10.1016/j.jclinepi.2020.11.003 Text en © 2020 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Original Article
Thomas, James
McDonald, Steve
Noel-Storr, Anna
Shemilt, Ian
Elliott, Julian
Mavergames, Chris
Marshall, Iain J.
Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title_full Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title_fullStr Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title_full_unstemmed Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title_short Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews
title_sort machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for cochrane reviews
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8168828/
https://www.ncbi.nlm.nih.gov/pubmed/33171275
http://dx.doi.org/10.1016/j.jclinepi.2020.11.003
work_keys_str_mv AT thomasjames machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT mcdonaldsteve machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT noelstorranna machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT shemiltian machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT elliottjulian machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT mavergameschris machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews
AT marshalliainj machinelearningreducedworkloadwithminimalriskofmissingstudiesdevelopmentandevaluationofarandomizedcontrolledtrialclassifierforcochranereviews