Cargando…

Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of i...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhaskar, Anand, Wang, Y.X. Rachel, Song, Yun S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315300/
https://www.ncbi.nlm.nih.gov/pubmed/25564017
http://dx.doi.org/10.1101/gr.178756.114
_version_ 1782355455844024320
author Bhaskar, Anand
Wang, Y.X. Rachel
Song, Yun S.
author_facet Bhaskar, Anand
Wang, Y.X. Rachel
Song, Yun S.
author_sort Bhaskar, Anand
collection PubMed
description With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions.
format Online
Article
Text
id pubmed-4315300
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-43153002015-08-01 Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data Bhaskar, Anand Wang, Y.X. Rachel Song, Yun S. Genome Res Method With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. Cold Spring Harbor Laboratory Press 2015-02 /pmc/articles/PMC4315300/ /pubmed/25564017 http://dx.doi.org/10.1101/gr.178756.114 Text en © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Bhaskar, Anand
Wang, Y.X. Rachel
Song, Yun S.
Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title_full Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title_fullStr Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title_full_unstemmed Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title_short Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
title_sort efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315300/
https://www.ncbi.nlm.nih.gov/pubmed/25564017
http://dx.doi.org/10.1101/gr.178756.114
work_keys_str_mv AT bhaskaranand efficientinferenceofpopulationsizehistoriesandlocusspecificmutationratesfromlargesamplegenomicvariationdata
AT wangyxrachel efficientinferenceofpopulationsizehistoriesandlocusspecificmutationratesfromlargesamplegenomicvariationdata
AT songyuns efficientinferenceofpopulationsizehistoriesandlocusspecificmutationratesfromlargesamplegenomicvariationdata