Cargando…

Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool

BACKGROUND: Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gates, Allison, Gates, Michelle, Sim, Shannon, Elliott, Sarah A., Pillay, Jennifer, Hartling, Lisa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369614/ https://www.ncbi.nlm.nih.gov/pubmed/34399684 http://dx.doi.org/10.1186/s12874-021-01354-2

_version_	1783739329391099904
author	Gates, Allison Gates, Michelle Sim, Shannon Elliott, Sarah A. Pillay, Jennifer Hartling, Lisa
author_facet	Gates, Allison Gates, Michelle Sim, Shannon Elliott, Sarah A. Pillay, Jennifer Hartling, Lisa
author_sort	Gates, Allison
collection	PubMed
description	BACKGROUND: Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. METHODS: For 75 randomized trials, we manually extracted and verified data for 21 data elements. We uploaded the randomized trials to an online machine learning and text mining tool, and quantified performance by evaluating its ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by the tool. We calculated the median (interquartile range [IQR]) time for manual and semi-automated data extraction, and overall time savings. RESULTS: The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91% (75% to 99%) accuracy. Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88% (83% to 99%) of cases. Among a median (IQR) 90% (86% to 97%) of relevant sentences, pertinent fragments had been highlighted by the tool; exact matches were unreliable (median (IQR) 52% [33% to 73%]). A median 48% of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 h total extraction time across 75 randomized trials). CONCLUSIONS: Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required. PROTOCOL: https://doi.org/10.7939/DVN/RQPJKS SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01354-2.
format	Online Article Text
id	pubmed-8369614
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-83696142021-08-18 Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool Gates, Allison Gates, Michelle Sim, Shannon Elliott, Sarah A. Pillay, Jennifer Hartling, Lisa BMC Med Res Methodol Research BACKGROUND: Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. METHODS: For 75 randomized trials, we manually extracted and verified data for 21 data elements. We uploaded the randomized trials to an online machine learning and text mining tool, and quantified performance by evaluating its ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by the tool. We calculated the median (interquartile range [IQR]) time for manual and semi-automated data extraction, and overall time savings. RESULTS: The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91% (75% to 99%) accuracy. Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88% (83% to 99%) of cases. Among a median (IQR) 90% (86% to 97%) of relevant sentences, pertinent fragments had been highlighted by the tool; exact matches were unreliable (median (IQR) 52% [33% to 73%]). A median 48% of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 h total extraction time across 75 randomized trials). CONCLUSIONS: Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required. PROTOCOL: https://doi.org/10.7939/DVN/RQPJKS SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01354-2. BioMed Central 2021-08-16 /pmc/articles/PMC8369614/ /pubmed/34399684 http://dx.doi.org/10.1186/s12874-021-01354-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Gates, Allison Gates, Michelle Sim, Shannon Elliott, Sarah A. Pillay, Jennifer Hartling, Lisa Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title	Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title_full	Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title_fullStr	Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title_full_unstemmed	Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title_short	Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
title_sort	creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369614/ https://www.ncbi.nlm.nih.gov/pubmed/34399684 http://dx.doi.org/10.1186/s12874-021-01354-2
work_keys_str_mv	AT gatesallison creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool AT gatesmichelle creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool AT simshannon creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool AT elliottsaraha creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool AT pillayjennifer creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool AT hartlinglisa creatingefficienciesintheextractionofdatafromrandomizedtrialsaprospectiveevaluationofamachinelearningandtextminingtool

Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool

Ejemplares similares