Cargando…

Ultrafast genome-wide inference of pairwise coalescence times

The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, con...

Descripción completa

Detalles Bibliográficos
Autores principales: Schweiger, Regev, Durbin, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538485/
https://www.ncbi.nlm.nih.gov/pubmed/37562965
http://dx.doi.org/10.1101/gr.277665.123
_version_ 1785113316746592256
author Schweiger, Regev
Durbin, Richard
author_facet Schweiger, Regev
Durbin, Richard
author_sort Schweiger, Regev
collection PubMed
description The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.
format Online
Article
Text
id pubmed-10538485
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-105384852023-09-29 Ultrafast genome-wide inference of pairwise coalescence times Schweiger, Regev Durbin, Richard Genome Res Methods The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, Gamma-SMC, which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set. Cold Spring Harbor Laboratory Press 2023-07 /pmc/articles/PMC10538485/ /pubmed/37562965 http://dx.doi.org/10.1101/gr.277665.123 Text en © 2023 Schweiger and Durbin; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Schweiger, Regev
Durbin, Richard
Ultrafast genome-wide inference of pairwise coalescence times
title Ultrafast genome-wide inference of pairwise coalescence times
title_full Ultrafast genome-wide inference of pairwise coalescence times
title_fullStr Ultrafast genome-wide inference of pairwise coalescence times
title_full_unstemmed Ultrafast genome-wide inference of pairwise coalescence times
title_short Ultrafast genome-wide inference of pairwise coalescence times
title_sort ultrafast genome-wide inference of pairwise coalescence times
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538485/
https://www.ncbi.nlm.nih.gov/pubmed/37562965
http://dx.doi.org/10.1101/gr.277665.123
work_keys_str_mv AT schweigerregev ultrafastgenomewideinferenceofpairwisecoalescencetimes
AT durbinrichard ultrafastgenomewideinferenceofpairwisecoalescencetimes