Cargando…

A functional hierarchical organization of the protein sequence space

BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sen...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaplan, Noam, Friedlich, Moriah, Fromer, Menachem, Linial, Michal
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC544566/
https://www.ncbi.nlm.nih.gov/pubmed/15596019
http://dx.doi.org/10.1186/1471-2105-5-196
_version_ 1782122151687487488
author Kaplan, Noam
Friedlich, Moriah
Fromer, Menachem
Linial, Michal
author_facet Kaplan, Noam
Friedlich, Moriah
Fromer, Menachem
Linial, Michal
author_sort Kaplan, Noam
collection PubMed
description BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.
format Text
id pubmed-544566
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5445662005-01-16 A functional hierarchical organization of the protein sequence space Kaplan, Noam Friedlich, Moriah Fromer, Menachem Linial, Michal BMC Bioinformatics Research Article BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins. BioMed Central 2004-12-14 /pmc/articles/PMC544566/ /pubmed/15596019 http://dx.doi.org/10.1186/1471-2105-5-196 Text en Copyright © 2004 Kaplan et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Kaplan, Noam
Friedlich, Moriah
Fromer, Menachem
Linial, Michal
A functional hierarchical organization of the protein sequence space
title A functional hierarchical organization of the protein sequence space
title_full A functional hierarchical organization of the protein sequence space
title_fullStr A functional hierarchical organization of the protein sequence space
title_full_unstemmed A functional hierarchical organization of the protein sequence space
title_short A functional hierarchical organization of the protein sequence space
title_sort functional hierarchical organization of the protein sequence space
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC544566/
https://www.ncbi.nlm.nih.gov/pubmed/15596019
http://dx.doi.org/10.1186/1471-2105-5-196
work_keys_str_mv AT kaplannoam afunctionalhierarchicalorganizationoftheproteinsequencespace
AT friedlichmoriah afunctionalhierarchicalorganizationoftheproteinsequencespace
AT fromermenachem afunctionalhierarchicalorganizationoftheproteinsequencespace
AT linialmichal afunctionalhierarchicalorganizationoftheproteinsequencespace
AT kaplannoam functionalhierarchicalorganizationoftheproteinsequencespace
AT friedlichmoriah functionalhierarchicalorganizationoftheproteinsequencespace
AT fromermenachem functionalhierarchicalorganizationoftheproteinsequencespace
AT linialmichal functionalhierarchicalorganizationoftheproteinsequencespace