Cargando…

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations

BACKGROUND: Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bansal, Vikas, Libiger, Ondrej
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4301802/ https://www.ncbi.nlm.nih.gov/pubmed/25592880 http://dx.doi.org/10.1186/s12859-014-0418-7

_version_	1782353692256632832
author	Bansal, Vikas Libiger, Ondrej
author_facet	Bansal, Vikas Libiger, Ondrej
author_sort	Bansal, Vikas
collection	PubMed
description	BACKGROUND: Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. RESULTS: We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. CONCLUSIONS: Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4301802
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43018022015-02-03 Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations Bansal, Vikas Libiger, Ondrej BMC Bioinformatics Methodology Article BACKGROUND: Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. RESULTS: We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. CONCLUSIONS: Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-16 /pmc/articles/PMC4301802/ /pubmed/25592880 http://dx.doi.org/10.1186/s12859-014-0418-7 Text en © Bansal and Libiger; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Bansal, Vikas Libiger, Ondrej Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title	Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title_full	Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title_fullStr	Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title_full_unstemmed	Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title_short	Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
title_sort	fast individual ancestry inference from dna sequence data leveraging allele frequencies for multiple populations
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4301802/ https://www.ncbi.nlm.nih.gov/pubmed/25592880 http://dx.doi.org/10.1186/s12859-014-0418-7
work_keys_str_mv	AT bansalvikas fastindividualancestryinferencefromdnasequencedataleveragingallelefrequenciesformultiplepopulations AT libigerondrej fastindividualancestryinferencefromdnasequencedataleveragingallelefrequenciesformultiplepopulations

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations

Ejemplares similares