Cargando…

External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients

AIM: Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models. METHODS: Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the m...

Descripción completa

Detalles Bibliográficos
Autores principales: Eertink, Jakoba J., Heymans, Martijn W., Zwezerijnen, Gerben J. C., Zijlstra, Josée M., de Vet, Henrica C. W., Boellaard, Ronald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464671/
https://www.ncbi.nlm.nih.gov/pubmed/36089634
http://dx.doi.org/10.1186/s13550-022-00931-w
_version_ 1784787632150020096
author Eertink, Jakoba J.
Heymans, Martijn W.
Zwezerijnen, Gerben J. C.
Zijlstra, Josée M.
de Vet, Henrica C. W.
Boellaard, Ronald
author_facet Eertink, Jakoba J.
Heymans, Martijn W.
Zwezerijnen, Gerben J. C.
Zijlstra, Josée M.
de Vet, Henrica C. W.
Boellaard, Ronald
author_sort Eertink, Jakoba J.
collection PubMed
description AIM: Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models. METHODS: Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC ± SD) and calibration slope. RESULTS: The cross-validation (0.71 ± 0.06) and holdout (0.70 ± 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 ± 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting. CONCLUSION: In case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables.
format Online
Article
Text
id pubmed-9464671
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-94646712022-09-13 External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients Eertink, Jakoba J. Heymans, Martijn W. Zwezerijnen, Gerben J. C. Zijlstra, Josée M. de Vet, Henrica C. W. Boellaard, Ronald EJNMMI Res Original Research AIM: Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models. METHODS: Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC ± SD) and calibration slope. RESULTS: The cross-validation (0.71 ± 0.06) and holdout (0.70 ± 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 ± 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting. CONCLUSION: In case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables. Springer Berlin Heidelberg 2022-09-11 /pmc/articles/PMC9464671/ /pubmed/36089634 http://dx.doi.org/10.1186/s13550-022-00931-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Research
Eertink, Jakoba J.
Heymans, Martijn W.
Zwezerijnen, Gerben J. C.
Zijlstra, Josée M.
de Vet, Henrica C. W.
Boellaard, Ronald
External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title_full External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title_fullStr External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title_full_unstemmed External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title_short External validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using PET data from DLBCL patients
title_sort external validation: a simulation study to compare cross-validation versus holdout or external testing to assess the performance of clinical prediction models using pet data from dlbcl patients
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9464671/
https://www.ncbi.nlm.nih.gov/pubmed/36089634
http://dx.doi.org/10.1186/s13550-022-00931-w
work_keys_str_mv AT eertinkjakobaj externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients
AT heymansmartijnw externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients
AT zwezerijnengerbenjc externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients
AT zijlstrajoseem externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients
AT devethenricacw externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients
AT boellaardronald externalvalidationasimulationstudytocomparecrossvalidationversusholdoutorexternaltestingtoassesstheperformanceofclinicalpredictionmodelsusingpetdatafromdlbclpatients