Cargando…

A scalable estimator of SNP heritability for biobank-scale data

MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to u...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Yue, Sankararaman, Sriram
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022682/
https://www.ncbi.nlm.nih.gov/pubmed/29950019
http://dx.doi.org/10.1093/bioinformatics/bty253
_version_ 1783335730183929856
author Wu, Yue
Sankararaman, Sriram
author_facet Wu, Yue
Sankararaman, Sriram
author_sort Wu, Yue
collection PubMed
description MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. RESULTS: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity [Formula: see text] for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to [Formula: see text]. We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. AVAILABILITY AND IMPLEMENTATION: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.
format Online
Article
Text
id pubmed-6022682
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226822018-07-05 A scalable estimator of SNP heritability for biobank-scale data Wu, Yue Sankararaman, Sriram Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens. RESULTS: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity [Formula: see text] for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to [Formula: see text]. We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. AVAILABILITY AND IMPLEMENTATION: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022682/ /pubmed/29950019 http://dx.doi.org/10.1093/bioinformatics/bty253 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Wu, Yue
Sankararaman, Sriram
A scalable estimator of SNP heritability for biobank-scale data
title A scalable estimator of SNP heritability for biobank-scale data
title_full A scalable estimator of SNP heritability for biobank-scale data
title_fullStr A scalable estimator of SNP heritability for biobank-scale data
title_full_unstemmed A scalable estimator of SNP heritability for biobank-scale data
title_short A scalable estimator of SNP heritability for biobank-scale data
title_sort scalable estimator of snp heritability for biobank-scale data
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022682/
https://www.ncbi.nlm.nih.gov/pubmed/29950019
http://dx.doi.org/10.1093/bioinformatics/bty253
work_keys_str_mv AT wuyue ascalableestimatorofsnpheritabilityforbiobankscaledata
AT sankararamansriram ascalableestimatorofsnpheritabilityforbiobankscaledata
AT wuyue scalableestimatorofsnpheritabilityforbiobankscaledata
AT sankararamansriram scalableestimatorofsnpheritabilityforbiobankscaledata