Cargando…

Fast and accurate joint inference of coancestry parameters for populations and/or individuals

We introduce a fast, new algorithm for inferring from allele count data the F(ST) parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to F(ST) values. The tree can reflect historical processes of splitt...

Descripción completa

Detalles Bibliográficos
Autores principales: Mary-Huard, Tristan, Balding, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888729/
https://www.ncbi.nlm.nih.gov/pubmed/36656906
http://dx.doi.org/10.1371/journal.pgen.1010054
_version_ 1784880585585459200
author Mary-Huard, Tristan
Balding, David
author_facet Mary-Huard, Tristan
Balding, David
author_sort Mary-Huard, Tristan
collection PubMed
description We introduce a fast, new algorithm for inferring from allele count data the F(ST) parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to F(ST) values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining F(ST), via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of F(ST) values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences.
format Online
Article
Text
id pubmed-9888729
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98887292023-02-01 Fast and accurate joint inference of coancestry parameters for populations and/or individuals Mary-Huard, Tristan Balding, David PLoS Genet Research Article We introduce a fast, new algorithm for inferring from allele count data the F(ST) parameters describing genetic distances among a set of populations and/or unrelated diploid individuals, and a tree with branch lengths corresponding to F(ST) values. The tree can reflect historical processes of splitting and divergence, but seeks to represent the actual genetic variance as accurately as possible with a tree structure. We generalise two major approaches to defining F(ST), via correlations and mismatch probabilities of sampled allele pairs, which measure shared and non-shared components of genetic variance. A diploid individual can be treated as a population of two gametes, which allows inference of coancestry coefficients for individuals as well as for populations, or a combination of the two. A simulation study illustrates that our fast method-of-moments estimation of F(ST) values, simultaneously for multiple populations/individuals, gains statistical efficiency over pairwise approaches when the population structure is close to tree-like. We apply our approach to genome-wide genotypes from the 26 worldwide human populations of the 1000 Genomes Project. We first analyse at the population level, then a subset of individuals and in a final analysis we pool individuals from the more homogeneous populations. This flexible analysis approach gives advantages over traditional approaches to population structure/coancestry, including visual and quantitative assessments of long-standing questions about the relative magnitudes of within- and between-population genetic differences. Public Library of Science 2023-01-19 /pmc/articles/PMC9888729/ /pubmed/36656906 http://dx.doi.org/10.1371/journal.pgen.1010054 Text en © 2023 Mary-Huard, Balding https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mary-Huard, Tristan
Balding, David
Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title_full Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title_fullStr Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title_full_unstemmed Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title_short Fast and accurate joint inference of coancestry parameters for populations and/or individuals
title_sort fast and accurate joint inference of coancestry parameters for populations and/or individuals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888729/
https://www.ncbi.nlm.nih.gov/pubmed/36656906
http://dx.doi.org/10.1371/journal.pgen.1010054
work_keys_str_mv AT maryhuardtristan fastandaccuratejointinferenceofcoancestryparametersforpopulationsandorindividuals
AT baldingdavid fastandaccuratejointinferenceofcoancestryparametersforpopulationsandorindividuals