Cargando…

Ensemble Linear Subspace Analysis of High-Dimensional Data

Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ahmed, S. Ejaz, Amiri, Saeid, Doksum, Kjell
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998555/ https://www.ncbi.nlm.nih.gov/pubmed/33803346 http://dx.doi.org/10.3390/e23030324

_version_	1783670578628001792
author	Ahmed, S. Ejaz Amiri, Saeid Doksum, Kjell
author_facet	Ahmed, S. Ejaz Amiri, Saeid Doksum, Kjell
author_sort	Ahmed, S. Ejaz
collection	PubMed
description	Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.
format	Online Article Text
id	pubmed-7998555
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-79985552021-03-28 Ensemble Linear Subspace Analysis of High-Dimensional Data Ahmed, S. Ejaz Amiri, Saeid Doksum, Kjell Entropy (Basel) Article Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor. MDPI 2021-03-09 /pmc/articles/PMC7998555/ /pubmed/33803346 http://dx.doi.org/10.3390/e23030324 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Ahmed, S. Ejaz Amiri, Saeid Doksum, Kjell Ensemble Linear Subspace Analysis of High-Dimensional Data
title	Ensemble Linear Subspace Analysis of High-Dimensional Data
title_full	Ensemble Linear Subspace Analysis of High-Dimensional Data
title_fullStr	Ensemble Linear Subspace Analysis of High-Dimensional Data
title_full_unstemmed	Ensemble Linear Subspace Analysis of High-Dimensional Data
title_short	Ensemble Linear Subspace Analysis of High-Dimensional Data
title_sort	ensemble linear subspace analysis of high-dimensional data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7998555/ https://www.ncbi.nlm.nih.gov/pubmed/33803346 http://dx.doi.org/10.3390/e23030324
work_keys_str_mv	AT ahmedsejaz ensemblelinearsubspaceanalysisofhighdimensionaldata AT amirisaeid ensemblelinearsubspaceanalysisofhighdimensionaldata AT doksumkjell ensemblelinearsubspaceanalysisofhighdimensionaldata

Ensemble Linear Subspace Analysis of High-Dimensional Data

Ejemplares similares