Cargando…

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models

High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hong, Feng, Tian, Lu, Devanarayan, Viswanath
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10660556/ https://www.ncbi.nlm.nih.gov/pubmed/37990696 http://dx.doi.org/10.3390/math11030557

_version_	1785148413764960256
author	Hong, Feng Tian, Lu Devanarayan, Viswanath
author_facet	Hong, Feng Tian, Lu Devanarayan, Viswanath
author_sort	Hong, Feng
collection	PubMed
description	High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a [Formula: see text] penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This [Formula: see text]-based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the [Formula: see text] regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.
format	Online Article Text
id	pubmed-10660556
institution	National Center for Biotechnology Information
language	English
publishDate	2023
record_format	MEDLINE/PubMed
spelling	pubmed-106605562023-11-21 Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models Hong, Feng Tian, Lu Devanarayan, Viswanath Mathematics (Basel) Article High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a [Formula: see text] penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This [Formula: see text]-based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the [Formula: see text] regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset. 2023-02 2023-01-20 /pmc/articles/PMC10660556/ /pubmed/37990696 http://dx.doi.org/10.3390/math11030557 Text en https://creativecommons.org/licenses/by/4.0/This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hong, Feng Tian, Lu Devanarayan, Viswanath Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title	Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title_full	Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title_fullStr	Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title_full_unstemmed	Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title_short	Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
title_sort	improving the robustness of variable selection and predictive performance of regularized generalized linear models and cox proportional hazard models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10660556/ https://www.ncbi.nlm.nih.gov/pubmed/37990696 http://dx.doi.org/10.3390/math11030557
work_keys_str_mv	AT hongfeng improvingtherobustnessofvariableselectionandpredictiveperformanceofregularizedgeneralizedlinearmodelsandcoxproportionalhazardmodels AT tianlu improvingtherobustnessofvariableselectionandpredictiveperformanceofregularizedgeneralizedlinearmodelsandcoxproportionalhazardmodels AT devanarayanviswanath improvingtherobustnessofvariableselectionandpredictiveperformanceofregularizedgeneralizedlinearmodelsandcoxproportionalhazardmodels

Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models

Ejemplares similares