Cargando…

Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles

Using high‐dimensional genetic variants such as single nucleotide polymorphisms (SNP) to predict complex diseases and traits has important applications in basic research and other clinical settings. For example, predicting gene expression is a necessary first step to identify (putative) causal genes...

Descripción completa

Detalles Bibliográficos
Autores principales:	Spanbauer, Charles, Pan, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892284/ https://www.ncbi.nlm.nih.gov/pubmed/36349692 http://dx.doi.org/10.1002/gepi.22505

_version_	1784881302830317568
author	Spanbauer, Charles Pan, Wei
author_facet	Spanbauer, Charles Pan, Wei
author_sort	Spanbauer, Charles
collection	PubMed
description	Using high‐dimensional genetic variants such as single nucleotide polymorphisms (SNP) to predict complex diseases and traits has important applications in basic research and other clinical settings. For example, predicting gene expression is a necessary first step to identify (putative) causal genes in transcriptome‐wide association studies. Due to weak signals, high‐dimensionality, and linkage disequilibrium (correlation) among SNPs, building such a prediction model is challenging. However, functional annotations at the SNP level (e.g., as epigenomic data across multiple cell‐ or tissue‐types) are available and could be used to inform predictor importance and aid in outcome prediction. Existing approaches to incorporate annotations have been based mainly on (generalized) linear models. Bayesian additive regression trees (BART), in contrast, is a reliable method to obtain high‐quality nonlinear out of sample predictions without overfitting. Unfortunately, the default prior from BART may be too inflexible to handle sparse situations where the number of predictors approaches or surpasses the number of observations. Motivated by our real data application, this article proposes an alternative prior based on the logit normal distribution because it provides a framework that is adaptive to sparsity and can model informative functional annotations. It also provides a framework to incorporate prior information about the between SNP correlations. Computational details for carrying out inference are presented along with the results from a simulation study and a genome‐wide prediction analysis of the Alzheimer's Disease Neuroimaging Initiative data.
format	Online Article Text
id	pubmed-9892284
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-98922842023-02-02 Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles Spanbauer, Charles Pan, Wei Genet Epidemiol Research Articles Using high‐dimensional genetic variants such as single nucleotide polymorphisms (SNP) to predict complex diseases and traits has important applications in basic research and other clinical settings. For example, predicting gene expression is a necessary first step to identify (putative) causal genes in transcriptome‐wide association studies. Due to weak signals, high‐dimensionality, and linkage disequilibrium (correlation) among SNPs, building such a prediction model is challenging. However, functional annotations at the SNP level (e.g., as epigenomic data across multiple cell‐ or tissue‐types) are available and could be used to inform predictor importance and aid in outcome prediction. Existing approaches to incorporate annotations have been based mainly on (generalized) linear models. Bayesian additive regression trees (BART), in contrast, is a reliable method to obtain high‐quality nonlinear out of sample predictions without overfitting. Unfortunately, the default prior from BART may be too inflexible to handle sparse situations where the number of predictors approaches or surpasses the number of observations. Motivated by our real data application, this article proposes an alternative prior based on the logit normal distribution because it provides a framework that is adaptive to sparsity and can model informative functional annotations. It also provides a framework to incorporate prior information about the between SNP correlations. Computational details for carrying out inference are presented along with the results from a simulation study and a genome‐wide prediction analysis of the Alzheimer's Disease Neuroimaging Initiative data. John Wiley and Sons Inc. 2022-11-09 2023-02 /pmc/articles/PMC9892284/ /pubmed/36349692 http://dx.doi.org/10.1002/gepi.22505 Text en © 2022 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Articles Spanbauer, Charles Pan, Wei Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title	Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title_full	Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title_fullStr	Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title_full_unstemmed	Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title_short	Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles
title_sort	sparse prediction informed by genetic annotations using the logit normal prior for bayesian regression tree ensembles
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892284/ https://www.ncbi.nlm.nih.gov/pubmed/36349692 http://dx.doi.org/10.1002/gepi.22505
work_keys_str_mv	AT spanbauercharles sparsepredictioninformedbygeneticannotationsusingthelogitnormalpriorforbayesianregressiontreeensembles AT panwei sparsepredictioninformedbygeneticannotationsusingthelogitnormalpriorforbayesianregressiontreeensembles AT sparsepredictioninformedbygeneticannotationsusingthelogitnormalpriorforbayesianregressiontreeensembles

Sparse prediction informed by genetic annotations using the logit normal prior for Bayesian regression tree ensembles

Ejemplares similares