Cargando…

The joint lasso: high-dimensional regression for group structured data

We consider high-dimensional regression over subgroups of observations. Our work is motivated by biomedical problems, where subsets of samples, representing for example disease subtypes, may differ with respect to underlying regression models. In the high-dimensional setting, estimating a different...

Descripción completa

Detalles Bibliográficos
Autores principales: Dondelinger, Frank, Mukherjee, Sach
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7868060/
https://www.ncbi.nlm.nih.gov/pubmed/30192903
http://dx.doi.org/10.1093/biostatistics/kxy035
_version_ 1783648391258963968
author Dondelinger, Frank
Mukherjee, Sach
author_facet Dondelinger, Frank
Mukherjee, Sach
author_sort Dondelinger, Frank
collection PubMed
description We consider high-dimensional regression over subgroups of observations. Our work is motivated by biomedical problems, where subsets of samples, representing for example disease subtypes, may differ with respect to underlying regression models. In the high-dimensional setting, estimating a different model for each subgroup is challenging due to limited sample sizes. Focusing on the case in which subgroup-specific models may be expected to be similar but not necessarily identical, we treat subgroups as related problem instances and jointly estimate subgroup-specific regression coefficients. This is done in a penalized framework, combining an [Formula: see text] term with an additional term that penalizes differences between subgroup-specific coefficients. This gives solutions that are globally sparse but that allow information-sharing between the subgroups. We present algorithms for estimation and empirical results on simulated data and using Alzheimer’s disease, amyotrophic lateral sclerosis, and cancer datasets. These examples demonstrate the gains joint estimation can offer in prediction as well as in providing subgroup-specific sparsity patterns.
format Online
Article
Text
id pubmed-7868060
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78680602021-02-10 The joint lasso: high-dimensional regression for group structured data Dondelinger, Frank Mukherjee, Sach Biostatistics Articles We consider high-dimensional regression over subgroups of observations. Our work is motivated by biomedical problems, where subsets of samples, representing for example disease subtypes, may differ with respect to underlying regression models. In the high-dimensional setting, estimating a different model for each subgroup is challenging due to limited sample sizes. Focusing on the case in which subgroup-specific models may be expected to be similar but not necessarily identical, we treat subgroups as related problem instances and jointly estimate subgroup-specific regression coefficients. This is done in a penalized framework, combining an [Formula: see text] term with an additional term that penalizes differences between subgroup-specific coefficients. This gives solutions that are globally sparse but that allow information-sharing between the subgroups. We present algorithms for estimation and empirical results on simulated data and using Alzheimer’s disease, amyotrophic lateral sclerosis, and cancer datasets. These examples demonstrate the gains joint estimation can offer in prediction as well as in providing subgroup-specific sparsity patterns. Oxford University Press 2018-09-05 /pmc/articles/PMC7868060/ /pubmed/30192903 http://dx.doi.org/10.1093/biostatistics/kxy035 Text en © The Author 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited
spellingShingle Articles
Dondelinger, Frank
Mukherjee, Sach
The joint lasso: high-dimensional regression for group structured data
title The joint lasso: high-dimensional regression for group structured data
title_full The joint lasso: high-dimensional regression for group structured data
title_fullStr The joint lasso: high-dimensional regression for group structured data
title_full_unstemmed The joint lasso: high-dimensional regression for group structured data
title_short The joint lasso: high-dimensional regression for group structured data
title_sort joint lasso: high-dimensional regression for group structured data
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7868060/
https://www.ncbi.nlm.nih.gov/pubmed/30192903
http://dx.doi.org/10.1093/biostatistics/kxy035
work_keys_str_mv AT dondelingerfrank thejointlassohighdimensionalregressionforgroupstructureddata
AT mukherjeesach thejointlassohighdimensionalregressionforgroupstructureddata
AT thejointlassohighdimensionalregressionforgroupstructureddata
AT dondelingerfrank jointlassohighdimensionalregressionforgroupstructureddata
AT mukherjeesach jointlassohighdimensionalregressionforgroupstructureddata
AT jointlassohighdimensionalregressionforgroupstructureddata