Cargando…
SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to clu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470591/ https://www.ncbi.nlm.nih.gov/pubmed/23077494 http://dx.doi.org/10.1371/journal.pone.0045685 |
_version_ | 1782246299725201408 |
---|---|
author | Bouaziz, Matthieu Paccard, Caroline Guedj, Mickael Ambroise, Christophe |
author_facet | Bouaziz, Matthieu Paccard, Caroline Guedj, Mickael Ambroise, Christophe |
author_sort | Bouaziz, Matthieu |
collection | PubMed |
description | Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns. |
format | Online Article Text |
id | pubmed-3470591 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34705912012-10-17 SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies Bouaziz, Matthieu Paccard, Caroline Guedj, Mickael Ambroise, Christophe PLoS One Research Article Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns. Public Library of Science 2012-10-12 /pmc/articles/PMC3470591/ /pubmed/23077494 http://dx.doi.org/10.1371/journal.pone.0045685 Text en © 2012 Bouaziz et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bouaziz, Matthieu Paccard, Caroline Guedj, Mickael Ambroise, Christophe SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title | SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title_full | SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title_fullStr | SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title_full_unstemmed | SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title_short | SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies |
title_sort | ships: spectral hierarchical clustering for the inference of population structure in genetic studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470591/ https://www.ncbi.nlm.nih.gov/pubmed/23077494 http://dx.doi.org/10.1371/journal.pone.0045685 |
work_keys_str_mv | AT bouazizmatthieu shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies AT paccardcaroline shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies AT guedjmickael shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies AT ambroisechristophe shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies |