Cargando…

Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors

We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true respons...

Descripción completa

Detalles Bibliográficos
Autores principales: Kubkowski, Mariusz, Mielniczuk, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516565/
https://www.ncbi.nlm.nih.gov/pubmed/33285928
http://dx.doi.org/10.3390/e22020153
_version_ 1783587030904602624
author Kubkowski, Mariusz
Mielniczuk, Jan
author_facet Kubkowski, Mariusz
Mielniczuk, Jan
author_sort Kubkowski, Mariusz
collection PubMed
description We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification.
format Online
Article
Text
id pubmed-7516565
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75165652020-11-09 Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors Kubkowski, Mariusz Mielniczuk, Jan Entropy (Basel) Article We consider selection of random predictors for a high-dimensional regression problem with a binary response for a general loss function. An important special case is when the binary model is semi-parametric and the response function is misspecified under a parametric model fit. When the true response coincides with a postulated parametric response for a certain value of parameter, we obtain a common framework for parametric inference. Both cases of correct specification and misspecification are covered in this contribution. Variable selection for such a scenario aims at recovering the support of the minimizer of the associated risk with large probability. We propose a two-step selection Screening-Selection (SS) procedure which consists of screening and ordering predictors by Lasso method and then selecting the subset of predictors which minimizes the Generalized Information Criterion for the corresponding nested family of models. We prove consistency of the proposed selection method under conditions that allow for a much larger number of predictors than the number of observations. For the semi-parametric case when distribution of random predictors satisfies linear regressions condition, the true and the estimated parameters are collinear and their common support can be consistently identified. This partly explains robustness of selection procedures to the response function misspecification. MDPI 2020-01-28 /pmc/articles/PMC7516565/ /pubmed/33285928 http://dx.doi.org/10.3390/e22020153 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kubkowski, Mariusz
Mielniczuk, Jan
Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title_full Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title_fullStr Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title_full_unstemmed Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title_short Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors
title_sort selection consistency of lasso-based procedures for misspecified high-dimensional binary model and random regressors
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516565/
https://www.ncbi.nlm.nih.gov/pubmed/33285928
http://dx.doi.org/10.3390/e22020153
work_keys_str_mv AT kubkowskimariusz selectionconsistencyoflassobasedproceduresformisspecifiedhighdimensionalbinarymodelandrandomregressors
AT mielniczukjan selectionconsistencyoflassobasedproceduresformisspecifiedhighdimensionalbinarymodelandrandomregressors