Cargando…

UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data

BACKGROUND: Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank...

Descripción completa

Detalles Bibliográficos
Autores principales:	Adamczak, Rafal, Meller, Jarek
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5198500/ https://www.ncbi.nlm.nih.gov/pubmed/28031034 http://dx.doi.org/10.1186/s12859-016-1381-2

_version_	1782488861201399808
author	Adamczak, Rafal Meller, Jarek
author_facet	Adamczak, Rafal Meller, Jarek
author_sort	Adamczak, Rafal
collection	PubMed
description	BACKGROUND: Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. RESULTS: uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust. CONCLUSION: uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1381-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5198500
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-51985002016-12-30 UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data Adamczak, Rafal Meller, Jarek BMC Bioinformatics Software BACKGROUND: Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. RESULTS: uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust. CONCLUSION: uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1381-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-28 /pmc/articles/PMC5198500/ /pubmed/28031034 http://dx.doi.org/10.1186/s12859-016-1381-2 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Adamczak, Rafal Meller, Jarek UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title	UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title_full	UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title_fullStr	UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title_full_unstemmed	UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title_short	UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
title_sort	uqlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5198500/ https://www.ncbi.nlm.nih.gov/pubmed/28031034 http://dx.doi.org/10.1186/s12859-016-1381-2
work_keys_str_mv	AT adamczakrafal uqlustcombiningprofilehashingwithlineartimerankingforefficientclusteringandanalysisofbigmacromoleculardata AT mellerjarek uqlustcombiningprofilehashingwithlineartimerankingforefficientclusteringandanalysisofbigmacromoleculardata

UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data

Ejemplares similares