Cargando…

Iterative pruning PCA improves resolution of highly structured populations

BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly...

Descripción completa

Detalles Bibliográficos
Autores principales: Intarapanich, Apichart, Shaw, Philip J, Assawamakin, Anunchai, Wangkumhang, Pongsakorn, Ngamphiw, Chumpol, Chaichoompu, Kridsadakorn, Piriyapongsa, Jittima, Tongsima, Sissades
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2790469/
https://www.ncbi.nlm.nih.gov/pubmed/19930644
http://dx.doi.org/10.1186/1471-2105-10-382
_version_ 1782175112951234560
author Intarapanich, Apichart
Shaw, Philip J
Assawamakin, Anunchai
Wangkumhang, Pongsakorn
Ngamphiw, Chumpol
Chaichoompu, Kridsadakorn
Piriyapongsa, Jittima
Tongsima, Sissades
author_facet Intarapanich, Apichart
Shaw, Philip J
Assawamakin, Anunchai
Wangkumhang, Pongsakorn
Ngamphiw, Chumpol
Chaichoompu, Kridsadakorn
Piriyapongsa, Jittima
Tongsima, Sissades
author_sort Intarapanich, Apichart
collection PubMed
description BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.
format Text
id pubmed-2790469
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27904692009-12-09 Iterative pruning PCA improves resolution of highly structured populations Intarapanich, Apichart Shaw, Philip J Assawamakin, Anunchai Wangkumhang, Pongsakorn Ngamphiw, Chumpol Chaichoompu, Kridsadakorn Piriyapongsa, Jittima Tongsima, Sissades BMC Bioinformatics Methodology article BACKGROUND: Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. RESULTS: A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. CONCLUSION: The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population. BioMed Central 2009-11-23 /pmc/articles/PMC2790469/ /pubmed/19930644 http://dx.doi.org/10.1186/1471-2105-10-382 Text en Copyright ©2009 Intarapanich et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Intarapanich, Apichart
Shaw, Philip J
Assawamakin, Anunchai
Wangkumhang, Pongsakorn
Ngamphiw, Chumpol
Chaichoompu, Kridsadakorn
Piriyapongsa, Jittima
Tongsima, Sissades
Iterative pruning PCA improves resolution of highly structured populations
title Iterative pruning PCA improves resolution of highly structured populations
title_full Iterative pruning PCA improves resolution of highly structured populations
title_fullStr Iterative pruning PCA improves resolution of highly structured populations
title_full_unstemmed Iterative pruning PCA improves resolution of highly structured populations
title_short Iterative pruning PCA improves resolution of highly structured populations
title_sort iterative pruning pca improves resolution of highly structured populations
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2790469/
https://www.ncbi.nlm.nih.gov/pubmed/19930644
http://dx.doi.org/10.1186/1471-2105-10-382
work_keys_str_mv AT intarapanichapichart iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT shawphilipj iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT assawamakinanunchai iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT wangkumhangpongsakorn iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT ngamphiwchumpol iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT chaichoompukridsadakorn iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT piriyapongsajittima iterativepruningpcaimprovesresolutionofhighlystructuredpopulations
AT tongsimasissades iterativepruningpcaimprovesresolutionofhighlystructuredpopulations