Cargando…

fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data

Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. Results: fast_protein_cluster is an optimized and extensible toolkit that suppor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hung, Ling-Hong, Samudrala, Ram
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058946/ https://www.ncbi.nlm.nih.gov/pubmed/24532722 http://dx.doi.org/10.1093/bioinformatics/btu098

_version_	1782321191724253184
author	Hung, Ling-Hong Samudrala, Ram
author_facet	Hung, Ling-Hong Samudrala, Ram
author_sort	Hung, Ling-Hong
collection	PubMed
description	Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. Availability and implementation: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) Contact: lhhung@compbio.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4058946
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40589462014-06-18 fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data Hung, Ling-Hong Samudrala, Ram Bioinformatics Applications Notes Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. Availability and implementation: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) Contact: lhhung@compbio.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-06-15 2014-02-14 /pmc/articles/PMC4058946/ /pubmed/24532722 http://dx.doi.org/10.1093/bioinformatics/btu098 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Notes Hung, Ling-Hong Samudrala, Ram fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title_full	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title_fullStr	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title_full_unstemmed	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title_short	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
title_sort	fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058946/ https://www.ncbi.nlm.nih.gov/pubmed/24532722 http://dx.doi.org/10.1093/bioinformatics/btu098
work_keys_str_mv	AT hunglinghong fastproteinclusterparallelandoptimizedclusteringoflargescaleproteinmodelingdata AT samudralaram fastproteinclusterparallelandoptimizedclusteringoflargescaleproteinmodelingdata

fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data

Ejemplares similares