Cargando…

Robust inference of population size histories from genomic sequencing data

Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-assoc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Upadhya, Gautam, Steinrücken, Matthias
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9518926/ https://www.ncbi.nlm.nih.gov/pubmed/36112715 http://dx.doi.org/10.1371/journal.pcbi.1010419

_version_	1784799293213769728
author	Upadhya, Gautam Steinrücken, Matthias
author_facet	Upadhya, Gautam Steinrücken, Matthias
author_sort	Upadhya, Gautam
collection	PubMed
description	Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly. Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (T(MRCA)) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes. We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using T(MRCA) as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.
format	Online Article Text
id	pubmed-9518926
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-95189262022-09-29 Robust inference of population size histories from genomic sequencing data Upadhya, Gautam Steinrücken, Matthias PLoS Comput Biol Research Article Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly. Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (T(MRCA)) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes. We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using T(MRCA) as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data. Public Library of Science 2022-09-16 /pmc/articles/PMC9518926/ /pubmed/36112715 http://dx.doi.org/10.1371/journal.pcbi.1010419 Text en © 2022 Upadhya, Steinrücken https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Upadhya, Gautam Steinrücken, Matthias Robust inference of population size histories from genomic sequencing data
title	Robust inference of population size histories from genomic sequencing data
title_full	Robust inference of population size histories from genomic sequencing data
title_fullStr	Robust inference of population size histories from genomic sequencing data
title_full_unstemmed	Robust inference of population size histories from genomic sequencing data
title_short	Robust inference of population size histories from genomic sequencing data
title_sort	robust inference of population size histories from genomic sequencing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9518926/ https://www.ncbi.nlm.nih.gov/pubmed/36112715 http://dx.doi.org/10.1371/journal.pcbi.1010419
work_keys_str_mv	AT upadhyagautam robustinferenceofpopulationsizehistoriesfromgenomicsequencingdata AT steinruckenmatthias robustinferenceofpopulationsizehistoriesfromgenomicsequencingdata

Robust inference of population size histories from genomic sequencing data

Ejemplares similares