Cargando…

Improving GWAS discovery and genomic prediction accuracy in biobank data

Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biob...

Descripción completa

Detalles Bibliográficos
Autores principales: Orliac, Etienne J., Trejo Banos, Daniel, Ojavee, Sven E., Läll, Kristi, Mägi, Reedik, Visscher, Peter M., Robinson, Matthew R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351350/
https://www.ncbi.nlm.nih.gov/pubmed/35905320
http://dx.doi.org/10.1073/pnas.2121279119
_version_ 1784762426203308032
author Orliac, Etienne J.
Trejo Banos, Daniel
Ojavee, Sven E.
Läll, Kristi
Mägi, Reedik
Visscher, Peter M.
Robinson, Matthew R.
author_facet Orliac, Etienne J.
Trejo Banos, Daniel
Ojavee, Sven E.
Läll, Kristi
Mägi, Reedik
Visscher, Peter M.
Robinson, Matthew R.
author_sort Orliac, Etienne J.
collection PubMed
description Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R(2) was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.
format Online
Article
Text
id pubmed-9351350
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-93513502022-08-05 Improving GWAS discovery and genomic prediction accuracy in biobank data Orliac, Etienne J. Trejo Banos, Daniel Ojavee, Sven E. Läll, Kristi Mägi, Reedik Visscher, Peter M. Robinson, Matthew R. Proc Natl Acad Sci U S A Biological Sciences Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R(2) was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies. National Academy of Sciences 2022-07-29 2022-08-02 /pmc/articles/PMC9351350/ /pubmed/35905320 http://dx.doi.org/10.1073/pnas.2121279119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Orliac, Etienne J.
Trejo Banos, Daniel
Ojavee, Sven E.
Läll, Kristi
Mägi, Reedik
Visscher, Peter M.
Robinson, Matthew R.
Improving GWAS discovery and genomic prediction accuracy in biobank data
title Improving GWAS discovery and genomic prediction accuracy in biobank data
title_full Improving GWAS discovery and genomic prediction accuracy in biobank data
title_fullStr Improving GWAS discovery and genomic prediction accuracy in biobank data
title_full_unstemmed Improving GWAS discovery and genomic prediction accuracy in biobank data
title_short Improving GWAS discovery and genomic prediction accuracy in biobank data
title_sort improving gwas discovery and genomic prediction accuracy in biobank data
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351350/
https://www.ncbi.nlm.nih.gov/pubmed/35905320
http://dx.doi.org/10.1073/pnas.2121279119
work_keys_str_mv AT orliacetiennej improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT trejobanosdaniel improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT ojaveesvene improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT lallkristi improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT magireedik improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT visscherpeterm improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata
AT robinsonmatthewr improvinggwasdiscoveryandgenomicpredictionaccuracyinbiobankdata