Cargando…
Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/ https://www.ncbi.nlm.nih.gov/pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x |
_version_ | 1783698651632107520 |
---|---|
author | Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. |
author_facet | Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. |
author_sort | Pham, Ba’ |
collection | PubMed |
description | BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x. |
format | Online Article Text |
id | pubmed-8152711 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81527112021-05-28 Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. Syst Rev Research BACKGROUND: Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated “workflow” to conduct abstract screening for systematic reviews and other knowledge synthesis methods. METHODS: We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for (“true”) eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening. RESULTS: With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews. CONCLUSION: The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review’s conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01700-x. BioMed Central 2021-05-26 /pmc/articles/PMC8152711/ /pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Pham, Ba’ Jovanovic, Jelena Bagheri, Ebrahim Antony, Jesmin Ashoor, Huda Nguyen, Tam T. Rios, Patricia Robson, Reid Thomas, Sonia M. Watt, Jennifer Straus, Sharon E. Tricco, Andrea C. Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title | Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title_full | Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title_fullStr | Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title_full_unstemmed | Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title_short | Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
title_sort | text mining to support abstract screening for knowledge syntheses: a semi-automated workflow |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152711/ https://www.ncbi.nlm.nih.gov/pubmed/34039433 http://dx.doi.org/10.1186/s13643-021-01700-x |
work_keys_str_mv | AT phamba textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT jovanovicjelena textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT bagheriebrahim textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT antonyjesmin textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT ashoorhuda textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT nguyentamt textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT riospatricia textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT robsonreid textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT thomassoniam textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT wattjennifer textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT straussharone textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow AT triccoandreac textminingtosupportabstractscreeningforknowledgesynthesesasemiautomatedworkflow |