Cargando…

Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies

There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cance...

Descripción completa

Detalles Bibliográficos
Autores principales: Kawaguchi, Eric S., Li, Sisi, Weaver, Garrett M., Lewinger, Juan Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581069/
https://www.ncbi.nlm.nih.gov/pubmed/36274755
http://dx.doi.org/10.6339/21-jds1030
_version_ 1784812535347675136
author Kawaguchi, Eric S.
Li, Sisi
Weaver, Garrett M.
Lewinger, Juan Pablo
author_facet Kawaguchi, Eric S.
Li, Sisi
Weaver, Garrett M.
Lewinger, Juan Pablo
author_sort Kawaguchi, Eric S.
collection PubMed
description There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.
format Online
Article
Text
id pubmed-9581069
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-95810692023-01-01 Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies Kawaguchi, Eric S. Li, Sisi Weaver, Garrett M. Lewinger, Juan Pablo J Data Sci Article There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of “meta features” to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features. 2022-01 2021-12-13 /pmc/articles/PMC9581069/ /pubmed/36274755 http://dx.doi.org/10.6339/21-jds1030 Text en https://creativecommons.org/licenses/by/4.0/Open access article under the CC BY (https://creativecommons.org/licenses/by/4.0/) license.
spellingShingle Article
Kawaguchi, Eric S.
Li, Sisi
Weaver, Garrett M.
Lewinger, Juan Pablo
Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title_full Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title_fullStr Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title_full_unstemmed Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title_short Hierarchical Ridge Regression for Incorporating Prior Information in Genomic Studies
title_sort hierarchical ridge regression for incorporating prior information in genomic studies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581069/
https://www.ncbi.nlm.nih.gov/pubmed/36274755
http://dx.doi.org/10.6339/21-jds1030
work_keys_str_mv AT kawaguchierics hierarchicalridgeregressionforincorporatingpriorinformationingenomicstudies
AT lisisi hierarchicalridgeregressionforincorporatingpriorinformationingenomicstudies
AT weavergarrettm hierarchicalridgeregressionforincorporatingpriorinformationingenomicstudies
AT lewingerjuanpablo hierarchicalridgeregressionforincorporatingpriorinformationingenomicstudies