Cargando…
It’s all relative: Regression analysis with compositional predictors
Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log-ratio transformations that are inadequate or inappropriate in modeling high-dimensional data with excessive zeros and hierarchical structures. M...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9767704/ https://www.ncbi.nlm.nih.gov/pubmed/35616500 http://dx.doi.org/10.1111/biom.13703 |
_version_ | 1784854017675886592 |
---|---|
author | Li, Gen Li, Yan Chen, Kun |
author_facet | Li, Gen Li, Yan Chen, Kun |
author_sort | Li, Gen |
collection | PubMed |
description | Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log-ratio transformations that are inadequate or inappropriate in modeling high-dimensional data with excessive zeros and hierarchical structures. Moreover, such models usually lack a straightforward interpretation due to the interrelation between parts of a composition. We develop a novel relative-shift regression framework that directly uses proportions as predictors. The new framework provides a paradigm shift for regression analysis with compositional predictors and offers a superior interpretation of how shifting concentration between parts affects the response. New equi-sparsity and tree-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. A unified finite-sample prediction error bound is derived for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies and a real gut microbiome study. Guided by the taxonomy of the microbiome data, the framework identifies important taxa at different taxonomic levels associated with the neurodevelopment of preterm infants. |
format | Online Article Text |
id | pubmed-9767704 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
record_format | MEDLINE/PubMed |
spelling | pubmed-97677042023-06-27 It’s all relative: Regression analysis with compositional predictors Li, Gen Li, Yan Chen, Kun Biometrics Article Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log-ratio transformations that are inadequate or inappropriate in modeling high-dimensional data with excessive zeros and hierarchical structures. Moreover, such models usually lack a straightforward interpretation due to the interrelation between parts of a composition. We develop a novel relative-shift regression framework that directly uses proportions as predictors. The new framework provides a paradigm shift for regression analysis with compositional predictors and offers a superior interpretation of how shifting concentration between parts affects the response. New equi-sparsity and tree-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. A unified finite-sample prediction error bound is derived for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies and a real gut microbiome study. Guided by the taxonomy of the microbiome data, the framework identifies important taxa at different taxonomic levels associated with the neurodevelopment of preterm infants. 2023-06 2022-07-11 /pmc/articles/PMC9767704/ /pubmed/35616500 http://dx.doi.org/10.1111/biom.13703 Text en https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the Creative Commons Attribution-NonCommercial (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
spellingShingle | Article Li, Gen Li, Yan Chen, Kun It’s all relative: Regression analysis with compositional predictors |
title | It’s all relative: Regression analysis with compositional predictors |
title_full | It’s all relative: Regression analysis with compositional predictors |
title_fullStr | It’s all relative: Regression analysis with compositional predictors |
title_full_unstemmed | It’s all relative: Regression analysis with compositional predictors |
title_short | It’s all relative: Regression analysis with compositional predictors |
title_sort | it’s all relative: regression analysis with compositional predictors |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9767704/ https://www.ncbi.nlm.nih.gov/pubmed/35616500 http://dx.doi.org/10.1111/biom.13703 |
work_keys_str_mv | AT ligen itsallrelativeregressionanalysiswithcompositionalpredictors AT liyan itsallrelativeregressionanalysiswithcompositionalpredictors AT chenkun itsallrelativeregressionanalysiswithcompositionalpredictors |