Cargando…

Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews

BACKGROUND: Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencie...

Descripción completa

Detalles Bibliográficos
Autores principales: Popoff, E., Besada, M., Jansen, J. P., Cope, S., Kanters, S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734810/
https://www.ncbi.nlm.nih.gov/pubmed/33308292
http://dx.doi.org/10.1186/s13643-020-01520-5
_version_ 1783622537100394496
author Popoff, E.
Besada, M.
Jansen, J. P.
Cope, S.
Kanters, S.
author_facet Popoff, E.
Besada, M.
Jansen, J. P.
Cope, S.
Kanters, S.
author_sort Popoff, E.
collection PubMed
description BACKGROUND: Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. METHODS: We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. RESULTS: The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. CONCLUSIONS: ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-020-01520-5.
format Online
Article
Text
id pubmed-7734810
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77348102020-12-15 Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews Popoff, E. Besada, M. Jansen, J. P. Cope, S. Kanters, S. Syst Rev Methodology BACKGROUND: Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. METHODS: We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. RESULTS: The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. CONCLUSIONS: ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-020-01520-5. BioMed Central 2020-12-13 /pmc/articles/PMC7734810/ /pubmed/33308292 http://dx.doi.org/10.1186/s13643-020-01520-5 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Popoff, E.
Besada, M.
Jansen, J. P.
Cope, S.
Kanters, S.
Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title_full Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title_fullStr Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title_full_unstemmed Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title_short Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
title_sort aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734810/
https://www.ncbi.nlm.nih.gov/pubmed/33308292
http://dx.doi.org/10.1186/s13643-020-01520-5
work_keys_str_mv AT popoffe aligningtextminingandmachinelearningalgorithmswithbestpracticesforstudyselectioninsystematicliteraturereviews
AT besadam aligningtextminingandmachinelearningalgorithmswithbestpracticesforstudyselectioninsystematicliteraturereviews
AT jansenjp aligningtextminingandmachinelearningalgorithmswithbestpracticesforstudyselectioninsystematicliteraturereviews
AT copes aligningtextminingandmachinelearningalgorithmswithbestpracticesforstudyselectioninsystematicliteraturereviews
AT kanterss aligningtextminingandmachinelearningalgorithmswithbestpracticesforstudyselectioninsystematicliteraturereviews