Cargando…

Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression

Medical research increasingly includes high‐dimensional regression modeling with a need for error‐in‐variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error‐corrected cross‐validation to enable error‐in‐variables regression, but requi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Escribe, Célia, Lu, Tianyuan, Keller‐Baruch, Julyan, Forgetta, Vincenzo, Xiao, Bowei, Richards, J. Brent, Bhatnagar, Sahir, Oualkacha, Karim, Greenwood, Celia M. T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2021
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292988/ https://www.ncbi.nlm.nih.gov/pubmed/34468045 http://dx.doi.org/10.1002/gepi.22430

_version_	1784749509843091456
author	Escribe, Célia Lu, Tianyuan Keller‐Baruch, Julyan Forgetta, Vincenzo Xiao, Bowei Richards, J. Brent Bhatnagar, Sahir Oualkacha, Karim Greenwood, Celia M. T.
author_facet	Escribe, Célia Lu, Tianyuan Keller‐Baruch, Julyan Forgetta, Vincenzo Xiao, Bowei Richards, J. Brent Bhatnagar, Sahir Oualkacha, Karim Greenwood, Celia M. T.
author_sort	Escribe, Célia
collection	PubMed
description	Medical research increasingly includes high‐dimensional regression modeling with a need for error‐in‐variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error‐corrected cross‐validation to enable error‐in‐variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high‐dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross‐validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate‐adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error‐in‐variables adjustments more accessible for high‐dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics‐facilitated personalized medicine research.
format	Online Article Text
id	pubmed-9292988
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-92929882022-07-20 Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression Escribe, Célia Lu, Tianyuan Keller‐Baruch, Julyan Forgetta, Vincenzo Xiao, Bowei Richards, J. Brent Bhatnagar, Sahir Oualkacha, Karim Greenwood, Celia M. T. Genet Epidemiol Research Articles Medical research increasingly includes high‐dimensional regression modeling with a need for error‐in‐variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error‐corrected cross‐validation to enable error‐in‐variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high‐dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross‐validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate‐adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error‐in‐variables adjustments more accessible for high‐dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics‐facilitated personalized medicine research. John Wiley and Sons Inc. 2021-09-01 2021-12 /pmc/articles/PMC9292988/ /pubmed/34468045 http://dx.doi.org/10.1002/gepi.22430 Text en © 2021 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	Research Articles Escribe, Célia Lu, Tianyuan Keller‐Baruch, Julyan Forgetta, Vincenzo Xiao, Bowei Richards, J. Brent Bhatnagar, Sahir Oualkacha, Karim Greenwood, Celia M. T. Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title	Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title_full	Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title_fullStr	Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title_full_unstemmed	Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title_short	Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
title_sort	block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9292988/ https://www.ncbi.nlm.nih.gov/pubmed/34468045 http://dx.doi.org/10.1002/gepi.22430
work_keys_str_mv	AT escribecelia blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT lutianyuan blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT kellerbaruchjulyan blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT forgettavincenzo blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT xiaobowei blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT richardsjbrent blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT bhatnagarsahir blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT oualkachakarim blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression AT greenwoodceliamt blockcoordinatedescentalgorithmimprovesvariableselectionandestimationinerrorinvariablesregression

Block coordinate descent algorithm improves variable selection and estimation in error‐in‐variables regression

Ejemplares similares