Cargando…
Large scale hierarchical clustering of protein sequences
BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and cle...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC547898/ https://www.ncbi.nlm.nih.gov/pubmed/15663796 http://dx.doi.org/10.1186/1471-2105-6-15 |
_version_ | 1782122307151462400 |
---|---|
author | Krause, Antje Stoye, Jens Vingron, Martin |
author_facet | Krause, Antje Stoye, Jens Vingron, Martin |
author_sort | Krause, Antje |
collection | PubMed |
description | BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at . CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences. |
format | Text |
id | pubmed-547898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-5478982005-02-04 Large scale hierarchical clustering of protein sequences Krause, Antje Stoye, Jens Vingron, Martin BMC Bioinformatics Methodology Article BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at . CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences. BioMed Central 2005-01-22 /pmc/articles/PMC547898/ /pubmed/15663796 http://dx.doi.org/10.1186/1471-2105-6-15 Text en Copyright © 2005 Krause et al; licensee BioMed Central Ltd. |
spellingShingle | Methodology Article Krause, Antje Stoye, Jens Vingron, Martin Large scale hierarchical clustering of protein sequences |
title | Large scale hierarchical clustering of protein sequences |
title_full | Large scale hierarchical clustering of protein sequences |
title_fullStr | Large scale hierarchical clustering of protein sequences |
title_full_unstemmed | Large scale hierarchical clustering of protein sequences |
title_short | Large scale hierarchical clustering of protein sequences |
title_sort | large scale hierarchical clustering of protein sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC547898/ https://www.ncbi.nlm.nih.gov/pubmed/15663796 http://dx.doi.org/10.1186/1471-2105-6-15 |
work_keys_str_mv | AT krauseantje largescalehierarchicalclusteringofproteinsequences AT stoyejens largescalehierarchicalclusteringofproteinsequences AT vingronmartin largescalehierarchicalclusteringofproteinsequences |