Cargando…

Calibur: a tool for clustering large numbers of protein decoys

BACKGROUND: Ab initio protein structure prediction methods generate numerous structural candidates, which are referred to as decoys. The decoy with the most number of neighbors of up to a threshold distance is typically identified as the most representative decoy. However, the clustering of decoys n...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Shuai Cheng, Ng, Yen Kaow
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881085/
https://www.ncbi.nlm.nih.gov/pubmed/20070892
http://dx.doi.org/10.1186/1471-2105-11-25
_version_ 1782182090398236672
author Li, Shuai Cheng
Ng, Yen Kaow
author_facet Li, Shuai Cheng
Ng, Yen Kaow
author_sort Li, Shuai Cheng
collection PubMed
description BACKGROUND: Ab initio protein structure prediction methods generate numerous structural candidates, which are referred to as decoys. The decoy with the most number of neighbors of up to a threshold distance is typically identified as the most representative decoy. However, the clustering of decoys needed for this criterion involves computations with runtimes that are at best quadratic in the number of decoys. As a result currently there is no tool that is designed to exactly cluster very large numbers of decoys, thus creating a bottleneck in the analysis. RESULTS: Using three strategies aimed at enhancing performance (proximate decoys organization, preliminary screening via lower and upper bounds, outliers filtering) we designed and implemented a software tool for clustering decoys called Calibur. We show empirical results indicating the effectiveness of each of the strategies employed. The strategies are further fine-tuned according to their effectiveness. Calibur demonstrated the ability to scale well with respect to increases in the number of decoys. For a sample size of approximately 30 thousand decoys, Calibur completed the analysis in one third of the time required when the strategies are not used. For practical use Calibur is able to automatically discover from the input decoys a suitable threshold distance for clustering. Several methods for this discovery are implemented in Calibur, where by default a very fast one is used. Using the default method Calibur reported relatively good decoys in our tests. CONCLUSIONS: Calibur's ability to handle very large protein decoy sets makes it a useful tool for clustering decoys in ab initio protein structure prediction. As the number of decoys generated in these methods increases, we believe Calibur will come in important for progress in the field.
format Text
id pubmed-2881085
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28810852010-06-05 Calibur: a tool for clustering large numbers of protein decoys Li, Shuai Cheng Ng, Yen Kaow BMC Bioinformatics Software BACKGROUND: Ab initio protein structure prediction methods generate numerous structural candidates, which are referred to as decoys. The decoy with the most number of neighbors of up to a threshold distance is typically identified as the most representative decoy. However, the clustering of decoys needed for this criterion involves computations with runtimes that are at best quadratic in the number of decoys. As a result currently there is no tool that is designed to exactly cluster very large numbers of decoys, thus creating a bottleneck in the analysis. RESULTS: Using three strategies aimed at enhancing performance (proximate decoys organization, preliminary screening via lower and upper bounds, outliers filtering) we designed and implemented a software tool for clustering decoys called Calibur. We show empirical results indicating the effectiveness of each of the strategies employed. The strategies are further fine-tuned according to their effectiveness. Calibur demonstrated the ability to scale well with respect to increases in the number of decoys. For a sample size of approximately 30 thousand decoys, Calibur completed the analysis in one third of the time required when the strategies are not used. For practical use Calibur is able to automatically discover from the input decoys a suitable threshold distance for clustering. Several methods for this discovery are implemented in Calibur, where by default a very fast one is used. Using the default method Calibur reported relatively good decoys in our tests. CONCLUSIONS: Calibur's ability to handle very large protein decoy sets makes it a useful tool for clustering decoys in ab initio protein structure prediction. As the number of decoys generated in these methods increases, we believe Calibur will come in important for progress in the field. BioMed Central 2010-01-13 /pmc/articles/PMC2881085/ /pubmed/20070892 http://dx.doi.org/10.1186/1471-2105-11-25 Text en Copyright ©2010 Li and Ng; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Li, Shuai Cheng
Ng, Yen Kaow
Calibur: a tool for clustering large numbers of protein decoys
title Calibur: a tool for clustering large numbers of protein decoys
title_full Calibur: a tool for clustering large numbers of protein decoys
title_fullStr Calibur: a tool for clustering large numbers of protein decoys
title_full_unstemmed Calibur: a tool for clustering large numbers of protein decoys
title_short Calibur: a tool for clustering large numbers of protein decoys
title_sort calibur: a tool for clustering large numbers of protein decoys
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881085/
https://www.ncbi.nlm.nih.gov/pubmed/20070892
http://dx.doi.org/10.1186/1471-2105-11-25
work_keys_str_mv AT lishuaicheng caliburatoolforclusteringlargenumbersofproteindecoys
AT ngyenkaow caliburatoolforclusteringlargenumbersofproteindecoys