Cargando…

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

BACKGROUND: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Beukelaer, Herman De, Smýkal, Petr, Davenport, Guy F, Fack, Veerle
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3554476/ https://www.ncbi.nlm.nih.gov/pubmed/23174036 http://dx.doi.org/10.1186/1471-2105-13-312

_version_	1782256899663593472
author	Beukelaer, Herman De Smýkal, Petr Davenport, Guy F Fack, Veerle
author_facet	Beukelaer, Herman De Smýkal, Petr Davenport, Guy F Fack, Veerle
author_sort	Beukelaer, Herman De
collection	PubMed
description	BACKGROUND: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times. RESULTS: Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC. CONCLUSION: The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn’t always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.
format	Online Article Text
id	pubmed-3554476
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35544762013-01-29 Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search Beukelaer, Herman De Smýkal, Petr Davenport, Guy F Fack, Veerle BMC Bioinformatics Research Article BACKGROUND: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times. RESULTS: Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC. CONCLUSION: The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn’t always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org. BioMed Central 2012-11-23 /pmc/articles/PMC3554476/ /pubmed/23174036 http://dx.doi.org/10.1186/1471-2105-13-312 Text en Copyright ©2012 De Beukelaer et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Beukelaer, Herman De Smýkal, Petr Davenport, Guy F Fack, Veerle Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title	Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title_full	Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title_fullStr	Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title_full_unstemmed	Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title_short	Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search
title_sort	core hunter ii: fast core subset selection based on multiple genetic diversity measures using mixed replica search
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3554476/ https://www.ncbi.nlm.nih.gov/pubmed/23174036 http://dx.doi.org/10.1186/1471-2105-13-312
work_keys_str_mv	AT beukelaerhermande corehunteriifastcoresubsetselectionbasedonmultiplegeneticdiversitymeasuresusingmixedreplicasearch AT smykalpetr corehunteriifastcoresubsetselectionbasedonmultiplegeneticdiversitymeasuresusingmixedreplicasearch AT davenportguyf corehunteriifastcoresubsetselectionbasedonmultiplegeneticdiversitymeasuresusingmixedreplicasearch AT fackveerle corehunteriifastcoresubsetselectionbasedonmultiplegeneticdiversitymeasuresusingmixedreplicasearch

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

Ejemplares similares