Cargando…

Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow

BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pham, Ba’, Jovanovic, Jelena, Bagheri, Ebrahim, Antony, Jesmin, Ashoor, Huda, Nguyen, Tam T., Rios, Patricia, Robson, Reid, Thomas, Sonia M., Watt, Jennifer, Straus, Sharon E., Tricco, Andrea C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/ https://www.ncbi.nlm.nih.gov/pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x

_version_	1783698651632107520
author	Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C.
author_facet	Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C.
author_sort	Pham, Ba’
collection	PubMed
description	BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x.
format	Online Article Text
id	pubmed-8152711
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-81527112021-05-28 Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. Syst Rev Research BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x. BioMed Central 2021-05-26 /pmc/articles/PMC8152711/ /pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title	Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_full	Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_fullStr	Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_full_unstemmed	Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_short	Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_sort	text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/ https://www.ncbi.nlm.nih.gov/pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x
work_keys_str_mv	AT phamba textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT jovanovicjelena textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT bagheriebrahim textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT antonyjesmin textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT ashoorhuda textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT nguyentamt textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT riospatricia textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT robsonreid textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT thomassoniam textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT wattjennifer textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT straussharone textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT triccoandreac textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow

Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow

Ejemplares similares