Cargando…

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

BACKGROUND: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jombart, Thibaut, Devillard, Sébastien, Balloux, François
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2973851/ https://www.ncbi.nlm.nih.gov/pubmed/20950446 http://dx.doi.org/10.1186/1471-2156-11-94

_version_	1782190845880958976
author	Jombart, Thibaut Devillard, Sébastien Balloux, François
author_facet	Jombart, Thibaut Devillard, Sébastien Balloux, François
author_sort	Jombart, Thibaut
collection	PubMed
description	BACKGROUND: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations. RESULTS: We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. CONCLUSIONS: Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.
format	Text
id	pubmed-2973851
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29738512010-11-05 Discriminant analysis of principal components: a new method for the analysis of genetically structured populations Jombart, Thibaut Devillard, Sébastien Balloux, François BMC Genet Methodology Article BACKGROUND: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations. RESULTS: We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza. CONCLUSIONS: Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets. BioMed Central 2010-10-15 /pmc/articles/PMC2973851/ /pubmed/20950446 http://dx.doi.org/10.1186/1471-2156-11-94 Text en Copyright ©2010 Jombart et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Jombart, Thibaut Devillard, Sébastien Balloux, François Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title	Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title_full	Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title_fullStr	Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title_full_unstemmed	Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title_short	Discriminant analysis of principal components: a new method for the analysis of genetically structured populations
title_sort	discriminant analysis of principal components: a new method for the analysis of genetically structured populations
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2973851/ https://www.ncbi.nlm.nih.gov/pubmed/20950446 http://dx.doi.org/10.1186/1471-2156-11-94
work_keys_str_mv	AT jombartthibaut discriminantanalysisofprincipalcomponentsanewmethodfortheanalysisofgeneticallystructuredpopulations AT devillardsebastien discriminantanalysisofprincipalcomponentsanewmethodfortheanalysisofgeneticallystructuredpopulations AT ballouxfrancois discriminantanalysisofprincipalcomponentsanewmethodfortheanalysisofgeneticallystructuredpopulations

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations

Ejemplares similares