Cargando…

Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx

Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype–phenotype association. Here we...

Descripción completa

Detalles Bibliográficos
Autores principales: Ko, Seyoon, Li, Ginny X, Choi, Hyungwon, Won, Joong-Ho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575036/
https://www.ncbi.nlm.nih.gov/pubmed/34254998
http://dx.doi.org/10.1093/bib/bbab256
_version_ 1784595605703622656
author Ko, Seyoon
Li, Ginny X
Choi, Hyungwon
Won, Joong-Ho
author_facet Ko, Seyoon
Li, Ginny X
Choi, Hyungwon
Won, Joong-Ho
author_sort Ko, Seyoon
collection PubMed
description Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype–phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.
format Online
Article
Text
id pubmed-8575036
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85750362021-11-09 Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx Ko, Seyoon Li, Ginny X Choi, Hyungwon Won, Joong-Ho Brief Bioinform Problem Solving Protocol Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype–phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation. Oxford University Press 2021-07-13 /pmc/articles/PMC8575036/ /pubmed/34254998 http://dx.doi.org/10.1093/bib/bbab256 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Ko, Seyoon
Li, Ginny X
Choi, Hyungwon
Won, Joong-Ho
Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title_full Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title_fullStr Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title_full_unstemmed Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title_short Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx
title_sort computationally scalable regression modeling for ultrahigh-dimensional omics data with parprox
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575036/
https://www.ncbi.nlm.nih.gov/pubmed/34254998
http://dx.doi.org/10.1093/bib/bbab256
work_keys_str_mv AT koseyoon computationallyscalableregressionmodelingforultrahighdimensionalomicsdatawithparprox
AT liginnyx computationallyscalableregressionmodelingforultrahighdimensionalomicsdatawithparprox
AT choihyungwon computationallyscalableregressionmodelingforultrahighdimensionalomicsdatawithparprox
AT wonjoongho computationallyscalableregressionmodelingforultrahighdimensionalomicsdatawithparprox