Cargando…

A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations

Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been dev...

Descripción completa

Detalles Bibliográficos
Autor principal: Gompert, Zachariah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788345/
https://www.ncbi.nlm.nih.gov/pubmed/26966908
http://dx.doi.org/10.1371/journal.pone.0151047
_version_ 1782420714735796224
author Gompert, Zachariah
author_facet Gompert, Zachariah
author_sort Gompert, Zachariah
collection PubMed
description Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been developed to infer genome-average ancestry and to deconvolute ancestry into continuous local ancestry blocks or tracts within individuals. Current methods for local ancestry inference perform well when admixture occurred recently or hybridization is ongoing, or when admixture occurred in the distant past such that local ancestry blocks have fixed in the admixed population. However, methods to infer local ancestry frequencies in isolated admixed populations still segregating for ancestry do not exist. In the current paper, I develop and test a continuous correlated beta process model to fill this analytical gap. The method explicitly models autocorrelations in ancestry frequencies at the population-level and uses discriminant analysis of SNP windows to take advantage of ancestry blocks within individuals. Analyses of simulated data sets show that the method is generally accurate such that ancestry frequency estimates exhibited low root-mean-square error and were highly correlated with the true values, particularly when large (±10 or ±20) SNP windows were used. Along these lines, the proposed method outperformed post hoc inference of ancestry frequencies from a traditional hidden Markov model (i.e., the linkage model in structure), particularly when admixture occurred more distantly in the past with little on-going gene flow or was followed by natural selection. The reliability and utility of the method was further assessed by analyzing genetic ancestry in an admixed human population (Uyghur) and three populations from a hybrid zone between Mus domesticus and M. musculus. Considerable variation in ancestry frequencies was detected within and among chromosomes in the Uyghur, with a large region of excess French ancestry harboring a gene with a known disease association. Similar variation was detected in the mouse hybrid zone, with notable constancy in regions of excess ancestry among admixed populations. By filling what has been an analytical gap, the proposed method should be a useful tool for many biologists. A computer program (popanc), written in C++, has been developed based on the proposed method and is available on-line at http://sourceforge.net/projects/popanc/.
format Online
Article
Text
id pubmed-4788345
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47883452016-03-23 A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations Gompert, Zachariah PLoS One Research Article Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been developed to infer genome-average ancestry and to deconvolute ancestry into continuous local ancestry blocks or tracts within individuals. Current methods for local ancestry inference perform well when admixture occurred recently or hybridization is ongoing, or when admixture occurred in the distant past such that local ancestry blocks have fixed in the admixed population. However, methods to infer local ancestry frequencies in isolated admixed populations still segregating for ancestry do not exist. In the current paper, I develop and test a continuous correlated beta process model to fill this analytical gap. The method explicitly models autocorrelations in ancestry frequencies at the population-level and uses discriminant analysis of SNP windows to take advantage of ancestry blocks within individuals. Analyses of simulated data sets show that the method is generally accurate such that ancestry frequency estimates exhibited low root-mean-square error and were highly correlated with the true values, particularly when large (±10 or ±20) SNP windows were used. Along these lines, the proposed method outperformed post hoc inference of ancestry frequencies from a traditional hidden Markov model (i.e., the linkage model in structure), particularly when admixture occurred more distantly in the past with little on-going gene flow or was followed by natural selection. The reliability and utility of the method was further assessed by analyzing genetic ancestry in an admixed human population (Uyghur) and three populations from a hybrid zone between Mus domesticus and M. musculus. Considerable variation in ancestry frequencies was detected within and among chromosomes in the Uyghur, with a large region of excess French ancestry harboring a gene with a known disease association. Similar variation was detected in the mouse hybrid zone, with notable constancy in regions of excess ancestry among admixed populations. By filling what has been an analytical gap, the proposed method should be a useful tool for many biologists. A computer program (popanc), written in C++, has been developed based on the proposed method and is available on-line at http://sourceforge.net/projects/popanc/. Public Library of Science 2016-03-11 /pmc/articles/PMC4788345/ /pubmed/26966908 http://dx.doi.org/10.1371/journal.pone.0151047 Text en © 2016 Zachariah Gompert http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gompert, Zachariah
A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title_full A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title_fullStr A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title_full_unstemmed A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title_short A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations
title_sort continuous correlated beta process model for genetic ancestry in admixed populations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788345/
https://www.ncbi.nlm.nih.gov/pubmed/26966908
http://dx.doi.org/10.1371/journal.pone.0151047
work_keys_str_mv AT gompertzachariah acontinuouscorrelatedbetaprocessmodelforgeneticancestryinadmixedpopulations
AT gompertzachariah continuouscorrelatedbetaprocessmodelforgeneticancestryinadmixedpopulations