Cargando…

A fast likelihood solution to the genetic clustering problem

1. The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or paren...

Descripción completa

Detalles Bibliográficos
Autores principales:	Beugin, Marie‐Pauline, Gayet, Thibault, Pontier, Dominique, Devillard, Sébastien, Jombart, Thibaut
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2018
Materias:	Population Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993310/ https://www.ncbi.nlm.nih.gov/pubmed/29938015 http://dx.doi.org/10.1111/2041-210X.12968

_version_	1783330224160636928
author	Beugin, Marie‐Pauline Gayet, Thibault Pontier, Dominique Devillard, Sébastien Jombart, Thibaut
author_facet	Beugin, Marie‐Pauline Gayet, Thibault Pontier, Dominique Devillard, Sébastien Jombart, Thibaut
author_sort	Beugin, Marie‐Pauline
collection	PubMed
description	1. The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model‐based methods, which are usually computer‐intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster. 2. Here, we introduce snapclust, a fast maximum‐likelihood solution to the genetic clustering problem, which allies the advantages of both model‐based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation‐Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness‐of‐fit statistics can also be used to guide the choice of the retained number of clusters. 3. Using extensive simulations, we show that snapclust performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model‐based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset. 4. snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co‐dominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer‐efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations.
format	Online Article Text
id	pubmed-5993310
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-59933102018-06-20 A fast likelihood solution to the genetic clustering problem Beugin, Marie‐Pauline Gayet, Thibault Pontier, Dominique Devillard, Sébastien Jombart, Thibaut Methods Ecol Evol Population Genetics 1. The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model‐based methods, which are usually computer‐intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster. 2. Here, we introduce snapclust, a fast maximum‐likelihood solution to the genetic clustering problem, which allies the advantages of both model‐based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation‐Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness‐of‐fit statistics can also be used to guide the choice of the retained number of clusters. 3. Using extensive simulations, we show that snapclust performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model‐based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset. 4. snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co‐dominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer‐efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations. John Wiley and Sons Inc. 2018-01-30 2018-04 /pmc/articles/PMC5993310/ /pubmed/29938015 http://dx.doi.org/10.1111/2041-210X.12968 Text en © 2018 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Population Genetics Beugin, Marie‐Pauline Gayet, Thibault Pontier, Dominique Devillard, Sébastien Jombart, Thibaut A fast likelihood solution to the genetic clustering problem
title	A fast likelihood solution to the genetic clustering problem
title_full	A fast likelihood solution to the genetic clustering problem
title_fullStr	A fast likelihood solution to the genetic clustering problem
title_full_unstemmed	A fast likelihood solution to the genetic clustering problem
title_short	A fast likelihood solution to the genetic clustering problem
title_sort	fast likelihood solution to the genetic clustering problem
topic	Population Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993310/ https://www.ncbi.nlm.nih.gov/pubmed/29938015 http://dx.doi.org/10.1111/2041-210X.12968
work_keys_str_mv	AT beuginmariepauline afastlikelihoodsolutiontothegeneticclusteringproblem AT gayetthibault afastlikelihoodsolutiontothegeneticclusteringproblem AT pontierdominique afastlikelihoodsolutiontothegeneticclusteringproblem AT devillardsebastien afastlikelihoodsolutiontothegeneticclusteringproblem AT jombartthibaut afastlikelihoodsolutiontothegeneticclusteringproblem AT beuginmariepauline fastlikelihoodsolutiontothegeneticclusteringproblem AT gayetthibault fastlikelihoodsolutiontothegeneticclusteringproblem AT pontierdominique fastlikelihoodsolutiontothegeneticclusteringproblem AT devillardsebastien fastlikelihoodsolutiontothegeneticclusteringproblem AT jombartthibaut fastlikelihoodsolutiontothegeneticclusteringproblem

A fast likelihood solution to the genetic clustering problem

Ejemplares similares