Cargando…

Training sample selection: Impact on screening automation in diagnostic test accuracy reviews

When performing a systematic review, researchers screen the articles retrieved after a broad search strategy one by one, which is time‐consuming. Computerised support of this screening process has been applied with varying success. This is partly due to the dependency on large amounts of data to dev...

Descripción completa

Detalles Bibliográficos
Autores principales: van Altena, Allard J., Spijker, René, Leeflang, Mariska M. G., Olabarriaga, Sílvia Delgado
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292892/
https://www.ncbi.nlm.nih.gov/pubmed/34390193
http://dx.doi.org/10.1002/jrsm.1518
_version_ 1784749485866352640
author van Altena, Allard J.
Spijker, René
Leeflang, Mariska M. G.
Olabarriaga, Sílvia Delgado
author_facet van Altena, Allard J.
Spijker, René
Leeflang, Mariska M. G.
Olabarriaga, Sílvia Delgado
author_sort van Altena, Allard J.
collection PubMed
description When performing a systematic review, researchers screen the articles retrieved after a broad search strategy one by one, which is time‐consuming. Computerised support of this screening process has been applied with varying success. This is partly due to the dependency on large amounts of data to develop models that predict inclusion. In this paper, we present an approach to choose which data to use in model training and compare it with established approaches. We used a dataset of 50 Cochrane diagnostic test accuracy reviews, and each was used as a target review. From the remaining 49 reviews, we selected those that most closely resembled the target review's clinical topic using the cosine similarity metric. Included and excluded studies from these selected reviews were then used to develop our prediction models. The performance of models trained on the selected reviews was compared against models trained on studies from all available reviews. The prediction models performed best with a larger number of reviews in the training set and on target reviews that had a research subject similar to other reviews in the dataset. Our approach using cosine similarity may reduce computational costs for model training and the duration of the screening process.
format Online
Article
Text
id pubmed-9292892
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-92928922022-07-20 Training sample selection: Impact on screening automation in diagnostic test accuracy reviews van Altena, Allard J. Spijker, René Leeflang, Mariska M. G. Olabarriaga, Sílvia Delgado Res Synth Methods Research Articles When performing a systematic review, researchers screen the articles retrieved after a broad search strategy one by one, which is time‐consuming. Computerised support of this screening process has been applied with varying success. This is partly due to the dependency on large amounts of data to develop models that predict inclusion. In this paper, we present an approach to choose which data to use in model training and compare it with established approaches. We used a dataset of 50 Cochrane diagnostic test accuracy reviews, and each was used as a target review. From the remaining 49 reviews, we selected those that most closely resembled the target review's clinical topic using the cosine similarity metric. Included and excluded studies from these selected reviews were then used to develop our prediction models. The performance of models trained on the selected reviews was compared against models trained on studies from all available reviews. The prediction models performed best with a larger number of reviews in the training set and on target reviews that had a research subject similar to other reviews in the dataset. Our approach using cosine similarity may reduce computational costs for model training and the duration of the screening process. John Wiley and Sons Inc. 2021-08-25 2021-11 /pmc/articles/PMC9292892/ /pubmed/34390193 http://dx.doi.org/10.1002/jrsm.1518 Text en © 2021 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Research Articles
van Altena, Allard J.
Spijker, René
Leeflang, Mariska M. G.
Olabarriaga, Sílvia Delgado
Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title_full Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title_fullStr Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title_full_unstemmed Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title_short Training sample selection: Impact on screening automation in diagnostic test accuracy reviews
title_sort training sample selection: impact on screening automation in diagnostic test accuracy reviews
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292892/
https://www.ncbi.nlm.nih.gov/pubmed/34390193
http://dx.doi.org/10.1002/jrsm.1518
work_keys_str_mv AT vanaltenaallardj trainingsampleselectionimpactonscreeningautomationindiagnostictestaccuracyreviews
AT spijkerrene trainingsampleselectionimpactonscreeningautomationindiagnostictestaccuracyreviews
AT leeflangmariskamg trainingsampleselectionimpactonscreeningautomationindiagnostictestaccuracyreviews
AT olabarriagasilviadelgado trainingsampleselectionimpactonscreeningautomationindiagnostictestaccuracyreviews