Cargando…

Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods

BACKGROUND: A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Onogi, Akio, Nurimoto, Masanobu, Morita, Mitsuo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161044/ https://www.ncbi.nlm.nih.gov/pubmed/21708038 http://dx.doi.org/10.1186/1471-2105-12-263

_version_	1782210630503104512
author	Onogi, Akio Nurimoto, Masanobu Morita, Mitsuo
author_facet	Onogi, Akio Nurimoto, Masanobu Morita, Mitsuo
author_sort	Onogi, Akio
collection	PubMed
description	BACKGROUND: A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. RESULTS: First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS) sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. CONCLUSIONS: The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS sampler and can consider the hyperparameter for the prior distribution of allele frequencies to be a variable.
format	Online Article Text
id	pubmed-3161044
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31610442011-08-25 Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods Onogi, Akio Nurimoto, Masanobu Morita, Mitsuo BMC Bioinformatics Research Article BACKGROUND: A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. RESULTS: First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS) sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. CONCLUSIONS: The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS sampler and can consider the hyperparameter for the prior distribution of allele frequencies to be a variable. BioMed Central 2011-06-28 /pmc/articles/PMC3161044/ /pubmed/21708038 http://dx.doi.org/10.1186/1471-2105-12-263 Text en Copyright ©2011 Onogi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Onogi, Akio Nurimoto, Masanobu Morita, Mitsuo Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title	Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title_full	Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title_fullStr	Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title_full_unstemmed	Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title_short	Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods
title_sort	characterization of a bayesian genetic clustering algorithm based on a dirichlet process prior and comparison among bayesian clustering methods
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161044/ https://www.ncbi.nlm.nih.gov/pubmed/21708038 http://dx.doi.org/10.1186/1471-2105-12-263
work_keys_str_mv	AT onogiakio characterizationofabayesiangeneticclusteringalgorithmbasedonadirichletprocesspriorandcomparisonamongbayesianclusteringmethods AT nurimotomasanobu characterizationofabayesiangeneticclusteringalgorithmbasedonadirichletprocesspriorandcomparisonamongbayesianclusteringmethods AT moritamitsuo characterizationofabayesiangeneticclusteringalgorithmbasedonadirichletprocesspriorandcomparisonamongbayesianclusteringmethods

Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods

Ejemplares similares