Cargando…

Scaling probabilistic models of genetic variation to millions of humans

A major goal of population genetics is to quantitatively understand variation of genetic polymorphisms among individuals. The aggregated number of genotyped humans is currently on the order millions of individuals, and existing methods do not scale to data of this size. To solve this problem we deve...

Descripción completa

Detalles Bibliográficos
Autores principales: Gopalan, Prem, Hao, Wei, Blei, David M., Storey, John D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5127768/
https://www.ncbi.nlm.nih.gov/pubmed/27819665
http://dx.doi.org/10.1038/ng.3710
_version_ 1782470280593014784
author Gopalan, Prem
Hao, Wei
Blei, David M.
Storey, John D.
author_facet Gopalan, Prem
Hao, Wei
Blei, David M.
Storey, John D.
author_sort Gopalan, Prem
collection PubMed
description A major goal of population genetics is to quantitatively understand variation of genetic polymorphisms among individuals. The aggregated number of genotyped humans is currently on the order millions of individuals, and existing methods do not scale to data of this size. To solve this problem we developed TeraStructure, an algorithm to fit Bayesian models of genetic variation in structured human populations on tera-sample-sized data sets (10(12) observed genotypes, e.g., 1M individuals at 1M SNPs). TeraStructure is a scalable approach to Bayesian inference in which subsamples of markers are used to update an estimate of the latent population structure between samples. We demonstrate that TeraStructure performs as well as existing methods on current globally sampled data, and we show using simulations that TeraStructure continues to be accurate and is the only method that can scale to tera-sample-sizes.
format Online
Article
Text
id pubmed-5127768
institution National Center for Biotechnology Information
language English
publishDate 2016
record_format MEDLINE/PubMed
spelling pubmed-51277682017-05-07 Scaling probabilistic models of genetic variation to millions of humans Gopalan, Prem Hao, Wei Blei, David M. Storey, John D. Nat Genet Article A major goal of population genetics is to quantitatively understand variation of genetic polymorphisms among individuals. The aggregated number of genotyped humans is currently on the order millions of individuals, and existing methods do not scale to data of this size. To solve this problem we developed TeraStructure, an algorithm to fit Bayesian models of genetic variation in structured human populations on tera-sample-sized data sets (10(12) observed genotypes, e.g., 1M individuals at 1M SNPs). TeraStructure is a scalable approach to Bayesian inference in which subsamples of markers are used to update an estimate of the latent population structure between samples. We demonstrate that TeraStructure performs as well as existing methods on current globally sampled data, and we show using simulations that TeraStructure continues to be accurate and is the only method that can scale to tera-sample-sizes. 2016-11-07 2016-12 /pmc/articles/PMC5127768/ /pubmed/27819665 http://dx.doi.org/10.1038/ng.3710 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Gopalan, Prem
Hao, Wei
Blei, David M.
Storey, John D.
Scaling probabilistic models of genetic variation to millions of humans
title Scaling probabilistic models of genetic variation to millions of humans
title_full Scaling probabilistic models of genetic variation to millions of humans
title_fullStr Scaling probabilistic models of genetic variation to millions of humans
title_full_unstemmed Scaling probabilistic models of genetic variation to millions of humans
title_short Scaling probabilistic models of genetic variation to millions of humans
title_sort scaling probabilistic models of genetic variation to millions of humans
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5127768/
https://www.ncbi.nlm.nih.gov/pubmed/27819665
http://dx.doi.org/10.1038/ng.3710
work_keys_str_mv AT gopalanprem scalingprobabilisticmodelsofgeneticvariationtomillionsofhumans
AT haowei scalingprobabilisticmodelsofgeneticvariationtomillionsofhumans
AT bleidavidm scalingprobabilisticmodelsofgeneticvariationtomillionsofhumans
AT storeyjohnd scalingprobabilisticmodelsofgeneticvariationtomillionsofhumans