Cargando…
A scalable estimator of SNP heritability for biobank-scale data
MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to u...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022682/ https://www.ncbi.nlm.nih.gov/pubmed/29950019 http://dx.doi.org/10.1093/bioinformatics/bty253 |
_version_ | 1783335730183929856 |
---|---|
author | Wu, Yue Sankararaman, Sriram |
author_facet | Wu, Yue Sankararaman, Sriram |
author_sort | Wu, Yue |
collection | PubMed |
description | MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. RESULTS: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity [Formula: see text] for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to [Formula: see text]. We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. AVAILABILITY AND IMPLEMENTATION: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg. |
format | Online Article Text |
id | pubmed-6022682 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60226822018-07-05 A scalable estimator of SNP heritability for biobank-scale data Wu, Yue Sankararaman, Sriram Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. RESULTS: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity [Formula: see text] for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to [Formula: see text]. We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. AVAILABILITY AND IMPLEMENTATION: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022682/ /pubmed/29950019 http://dx.doi.org/10.1093/bioinformatics/bty253 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Wu, Yue Sankararaman, Sriram A scalable estimator of SNP heritability for biobank-scale data |
title | A scalable estimator of SNP heritability for biobank-scale data |
title_full | A scalable estimator of SNP heritability for biobank-scale data |
title_fullStr | A scalable estimator of SNP heritability for biobank-scale data |
title_full_unstemmed | A scalable estimator of SNP heritability for biobank-scale data |
title_short | A scalable estimator of SNP heritability for biobank-scale data |
title_sort | scalable estimator of snp heritability for biobank-scale data |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022682/ https://www.ncbi.nlm.nih.gov/pubmed/29950019 http://dx.doi.org/10.1093/bioinformatics/bty253 |
work_keys_str_mv | AT wuyue ascalableestimatorofsnpheritabilityforbiobankscaledata AT sankararamansriram ascalableestimatorofsnpheritabilityforbiobankscaledata AT wuyue scalableestimatorofsnpheritabilityforbiobankscaledata AT sankararamansriram scalableestimatorofsnpheritabilityforbiobankscaledata |