Cargando…

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

BACKGROUND: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leeuwenberg, Artuur M., van Smeden, Maarten, Langendijk, Johannes A., van der Schaaf, Arjen, Mauer, Murielle E., Moons, Karel G. M., Reitsma, Johannes B., Schuit, Ewoud
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8751246/ https://www.ncbi.nlm.nih.gov/pubmed/35016734 http://dx.doi.org/10.1186/s41512-021-00115-5

_version_	1784631643277885440
author	Leeuwenberg, Artuur M. van Smeden, Maarten Langendijk, Johannes A. van der Schaaf, Arjen Mauer, Murielle E. Moons, Karel G. M. Reitsma, Johannes B. Schuit, Ewoud
author_facet	Leeuwenberg, Artuur M. van Smeden, Maarten Langendijk, Johannes A. van der Schaaf, Arjen Mauer, Murielle E. Moons, Karel G. M. Reitsma, Johannes B. Schuit, Ewoud
author_sort	Leeuwenberg, Artuur M.
collection	PubMed
description	BACKGROUND: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. METHODS: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. RESULTS: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R(2), Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout. CONCLUSIONS: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-021-00115-5.
format	Online Article Text
id	pubmed-8751246
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-87512462022-01-11 Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods Leeuwenberg, Artuur M. van Smeden, Maarten Langendijk, Johannes A. van der Schaaf, Arjen Mauer, Murielle E. Moons, Karel G. M. Reitsma, Johannes B. Schuit, Ewoud Diagn Progn Res Research BACKGROUND: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. METHODS: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. RESULTS: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R(2), Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout. CONCLUSIONS: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-021-00115-5. BioMed Central 2022-01-11 /pmc/articles/PMC8751246/ /pubmed/35016734 http://dx.doi.org/10.1186/s41512-021-00115-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Research Leeuwenberg, Artuur M. van Smeden, Maarten Langendijk, Johannes A. van der Schaaf, Arjen Mauer, Murielle E. Moons, Karel G. M. Reitsma, Johannes B. Schuit, Ewoud Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title	Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title_full	Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title_fullStr	Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title_full_unstemmed	Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title_short	Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
title_sort	performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8751246/ https://www.ncbi.nlm.nih.gov/pubmed/35016734 http://dx.doi.org/10.1186/s41512-021-00115-5
work_keys_str_mv	AT leeuwenbergartuurm performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT vansmedenmaarten performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT langendijkjohannesa performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT vanderschaafarjen performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT mauermuriellee performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT moonskarelgm performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT reitsmajohannesb performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods AT schuitewoud performanceofbinarypredictionmodelsinhighcorrelationlowdimensionalsettingsacomparisonofmethods

Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods

Ejemplares similares