Cargando…

Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure

BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry betw...

Descripción completa

Detalles Bibliográficos
Autores principales: Byun, Jinyoung, Han, Younghun, Gorlov, Ivan P., Busam, Jonathan A., Seldin, Michael F., Amos, Christopher I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/
https://www.ncbi.nlm.nih.gov/pubmed/29037167
http://dx.doi.org/10.1186/s12864-017-4166-8
_version_ 1783271684885708800
author Byun, Jinyoung
Han, Younghun
Gorlov, Ivan P.
Busam, Jonathan A.
Seldin, Michael F.
Amos, Christopher I.
author_facet Byun, Jinyoung
Han, Younghun
Gorlov, Ivan P.
Busam, Jonathan A.
Seldin, Michael F.
Amos, Christopher I.
author_sort Byun, Jinyoung
collection PubMed
description BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. METHODS: In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. RESULTS: We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. CONCLUSIONS: Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4166-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5644186
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56441862017-10-26 Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure Byun, Jinyoung Han, Younghun Gorlov, Ivan P. Busam, Jonathan A. Seldin, Michael F. Amos, Christopher I. BMC Genomics Methodology Article BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. METHODS: In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. RESULTS: We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. CONCLUSIONS: Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4166-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-16 /pmc/articles/PMC5644186/ /pubmed/29037167 http://dx.doi.org/10.1186/s12864-017-4166-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Byun, Jinyoung
Han, Younghun
Gorlov, Ivan P.
Busam, Jonathan A.
Seldin, Michael F.
Amos, Christopher I.
Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title_full Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title_fullStr Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title_full_unstemmed Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title_short Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
title_sort ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/
https://www.ncbi.nlm.nih.gov/pubmed/29037167
http://dx.doi.org/10.1186/s12864-017-4166-8
work_keys_str_mv AT byunjinyoung ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure
AT hanyounghun ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure
AT gorlovivanp ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure
AT busamjonathana ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure
AT seldinmichaelf ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure
AT amoschristopheri ancestryinferenceusingprincipalcomponentanalysisandspatialanalysisadistancebasedanalysistoaccountforpopulationsubstructure