Cargando…

SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to clu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bouaziz, Matthieu, Paccard, Caroline, Guedj, Mickael, Ambroise, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470591/
https://www.ncbi.nlm.nih.gov/pubmed/23077494
http://dx.doi.org/10.1371/journal.pone.0045685
_version_ 1782246299725201408
author Bouaziz, Matthieu
Paccard, Caroline
Guedj, Mickael
Ambroise, Christophe
author_facet Bouaziz, Matthieu
Paccard, Caroline
Guedj, Mickael
Ambroise, Christophe
author_sort Bouaziz, Matthieu
collection PubMed
description Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns.
format Online
Article
Text
id pubmed-3470591
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34705912012-10-17 SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies Bouaziz, Matthieu Paccard, Caroline Guedj, Mickael Ambroise, Christophe PLoS One Research Article Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns. Public Library of Science 2012-10-12 /pmc/articles/PMC3470591/ /pubmed/23077494 http://dx.doi.org/10.1371/journal.pone.0045685 Text en © 2012 Bouaziz et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bouaziz, Matthieu
Paccard, Caroline
Guedj, Mickael
Ambroise, Christophe
SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title_full SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title_fullStr SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title_full_unstemmed SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title_short SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies
title_sort ships: spectral hierarchical clustering for the inference of population structure in genetic studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3470591/
https://www.ncbi.nlm.nih.gov/pubmed/23077494
http://dx.doi.org/10.1371/journal.pone.0045685
work_keys_str_mv AT bouazizmatthieu shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies
AT paccardcaroline shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies
AT guedjmickael shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies
AT ambroisechristophe shipsspectralhierarchicalclusteringfortheinferenceofpopulationstructureingeneticstudies