Cargando…
A functional hierarchical organization of the protein sequence space
BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sen...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC544566/ https://www.ncbi.nlm.nih.gov/pubmed/15596019 http://dx.doi.org/10.1186/1471-2105-5-196 |
_version_ | 1782122151687487488 |
---|---|
author | Kaplan, Noam Friedlich, Moriah Fromer, Menachem Linial, Michal |
author_facet | Kaplan, Noam Friedlich, Moriah Fromer, Menachem Linial, Michal |
author_sort | Kaplan, Noam |
collection | PubMed |
description | BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins. |
format | Text |
id | pubmed-544566 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-5445662005-01-16 A functional hierarchical organization of the protein sequence space Kaplan, Noam Friedlich, Moriah Fromer, Menachem Linial, Michal BMC Bioinformatics Research Article BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins. BioMed Central 2004-12-14 /pmc/articles/PMC544566/ /pubmed/15596019 http://dx.doi.org/10.1186/1471-2105-5-196 Text en Copyright © 2004 Kaplan et al; licensee BioMed Central Ltd. |
spellingShingle | Research Article Kaplan, Noam Friedlich, Moriah Fromer, Menachem Linial, Michal A functional hierarchical organization of the protein sequence space |
title | A functional hierarchical organization of the protein sequence space |
title_full | A functional hierarchical organization of the protein sequence space |
title_fullStr | A functional hierarchical organization of the protein sequence space |
title_full_unstemmed | A functional hierarchical organization of the protein sequence space |
title_short | A functional hierarchical organization of the protein sequence space |
title_sort | functional hierarchical organization of the protein sequence space |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC544566/ https://www.ncbi.nlm.nih.gov/pubmed/15596019 http://dx.doi.org/10.1186/1471-2105-5-196 |
work_keys_str_mv | AT kaplannoam afunctionalhierarchicalorganizationoftheproteinsequencespace AT friedlichmoriah afunctionalhierarchicalorganizationoftheproteinsequencespace AT fromermenachem afunctionalhierarchicalorganizationoftheproteinsequencespace AT linialmichal afunctionalhierarchicalorganizationoftheproteinsequencespace AT kaplannoam functionalhierarchicalorganizationoftheproteinsequencespace AT friedlichmoriah functionalhierarchicalorganizationoftheproteinsequencespace AT fromermenachem functionalhierarchicalorganizationoftheproteinsequencespace AT linialmichal functionalhierarchicalorganizationoftheproteinsequencespace |