Cargando…

A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations

Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been dev...

Descripción completa

Detalles Bibliográficos
Autor principal: Gompert, Zachariah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788345/
https://www.ncbi.nlm.nih.gov/pubmed/26966908
http://dx.doi.org/10.1371/journal.pone.0151047
Descripción
Sumario:Admixture and recombination create populations and genomes with genetic ancestry from multiple source populations. Analyses of genetic ancestry in admixed populations are relevant for trait and disease mapping, studies of speciation, and conservation efforts. Consequently, many methods have been developed to infer genome-average ancestry and to deconvolute ancestry into continuous local ancestry blocks or tracts within individuals. Current methods for local ancestry inference perform well when admixture occurred recently or hybridization is ongoing, or when admixture occurred in the distant past such that local ancestry blocks have fixed in the admixed population. However, methods to infer local ancestry frequencies in isolated admixed populations still segregating for ancestry do not exist. In the current paper, I develop and test a continuous correlated beta process model to fill this analytical gap. The method explicitly models autocorrelations in ancestry frequencies at the population-level and uses discriminant analysis of SNP windows to take advantage of ancestry blocks within individuals. Analyses of simulated data sets show that the method is generally accurate such that ancestry frequency estimates exhibited low root-mean-square error and were highly correlated with the true values, particularly when large (±10 or ±20) SNP windows were used. Along these lines, the proposed method outperformed post hoc inference of ancestry frequencies from a traditional hidden Markov model (i.e., the linkage model in structure), particularly when admixture occurred more distantly in the past with little on-going gene flow or was followed by natural selection. The reliability and utility of the method was further assessed by analyzing genetic ancestry in an admixed human population (Uyghur) and three populations from a hybrid zone between Mus domesticus and M. musculus. Considerable variation in ancestry frequencies was detected within and among chromosomes in the Uyghur, with a large region of excess French ancestry harboring a gene with a known disease association. Similar variation was detected in the mouse hybrid zone, with notable constancy in regions of excess ancestry among admixed populations. By filling what has been an analytical gap, the proposed method should be a useful tool for many biologists. A computer program (popanc), written in C++, has been developed based on the proposed method and is available on-line at http://sourceforge.net/projects/popanc/.