Cargando…

Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow

BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge...

Descripción completa

Detalles Bibliográficos
Autores principales: Pham, Ba’, Jovanovic, Jelena, Bagheri, Ebrahim, Antony, Jesmin, Ashoor, Huda, Nguyen, Tam T., Rios, Patricia, Robson, Reid, Thomas, Sonia M., Watt, Jennifer, Straus, Sharon E., Tricco, Andrea C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/
https://www.ncbi.nlm.nih.gov/pubmed/34039433
http://dx.doi.org/10.1186/s13643-021-01700-x
_version_ 1783698651632107520
author Pham, Ba’
Jovanovic, Jelena
Bagheri, Ebrahim
Antony, Jesmin
Ashoor, Huda
Nguyen, Tam T.
Rios, Patricia
Robson, Reid
Thomas, Sonia M.
Watt, Jennifer
Straus, Sharon E.
Tricco, Andrea C.
author_facet Pham, Ba’
Jovanovic, Jelena
Bagheri, Ebrahim
Antony, Jesmin
Ashoor, Huda
Nguyen, Tam T.
Rios, Patricia
Robson, Reid
Thomas, Sonia M.
Watt, Jennifer
Straus, Sharon E.
Tricco, Andrea C.
author_sort Pham, Ba’
collection PubMed
description BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x.
format Online
Article
Text
id pubmed-8152711
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81527112021-05-28 Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. Syst Rev Research BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x. BioMed Central 2021-05-26 /pmc/articles/PMC8152711/ /pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Pham, Ba’
Jovanovic, Jelena
Bagheri, Ebrahim
Antony, Jesmin
Ashoor, Huda
Nguyen, Tam T.
Rios, Patricia
Robson, Reid
Thomas, Sonia M.
Watt, Jennifer
Straus, Sharon E.
Tricco, Andrea C.
Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_full Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_fullStr Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_full_unstemmed Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_short Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
title_sort text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/
https://www.ncbi.nlm.nih.gov/pubmed/34039433
http://dx.doi.org/10.1186/s13643-021-01700-x
work_keys_str_mv AT phamba textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT jovanovicjelena textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT bagheriebrahim textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT antonyjesmin textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT ashoorhuda textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT nguyentamt textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT riospatricia textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT robsonreid textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT thomassoniam textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT wattjennifer textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT straussharone textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow
AT triccoandreac textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow