Cargando…
Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true respons...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516565/ https://www.ncbi.nlm.nih.gov/pubmed/33285928 http://dx.doi.org/10.3390/e22020153 |
_version_ | 1783587030904602624 |
---|---|
author | Kubkowski, Mariusz Mielniczuk, Jan |
author_facet | Kubkowski, Mariusz Mielniczuk, Jan |
author_sort | Kubkowski, Mariusz |
collection | PubMed |
description | We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification. |
format | Online Article Text |
id | pubmed-7516565 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75165652020-11-09 Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors Kubkowski, Mariusz Mielniczuk, Jan Entropy (Basel) Article We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification. MDPI 2020-01-28 /pmc/articles/PMC7516565/ /pubmed/33285928 http://dx.doi.org/10.3390/e22020153 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Kubkowski, Mariusz Mielniczuk, Jan Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title | Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title_full | Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title_fullStr | Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title_full_unstemmed | Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title_short | Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors |
title_sort | selection consistency of lasso-based procedures for misspecified high-dimensional binary model and random regressors |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516565/ https://www.ncbi.nlm.nih.gov/pubmed/33285928 http://dx.doi.org/10.3390/e22020153 |
work_keys_str_mv | AT kubkowskimariusz selectionconsistencyoflassobasedproceduresformisspecifiedhighdimensionalbinarymodelandrandomregressors AT mielniczukjan selectionconsistencyoflassobasedproceduresformisspecifiedhighdimensionalbinarymodelandrandomregressors |