Cargando…

Large scale hierarchical clustering of protein sequences

BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and cle...

Descripción completa

Detalles Bibliográficos
Autores principales: Krause, Antje, Stoye, Jens, Vingron, Martin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC547898/
https://www.ncbi.nlm.nih.gov/pubmed/15663796
http://dx.doi.org/10.1186/1471-2105-6-15
_version_ 1782122307151462400
author Krause, Antje
Stoye, Jens
Vingron, Martin
author_facet Krause, Antje
Stoye, Jens
Vingron, Martin
author_sort Krause, Antje
collection PubMed
description BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at . CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.
format Text
id pubmed-547898
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5478982005-02-04 Large scale hierarchical clustering of protein sequences Krause, Antje Stoye, Jens Vingron, Martin BMC Bioinformatics Methodology Article BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at . CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences. BioMed Central 2005-01-22 /pmc/articles/PMC547898/ /pubmed/15663796 http://dx.doi.org/10.1186/1471-2105-6-15 Text en Copyright © 2005 Krause et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Krause, Antje
Stoye, Jens
Vingron, Martin
Large scale hierarchical clustering of protein sequences
title Large scale hierarchical clustering of protein sequences
title_full Large scale hierarchical clustering of protein sequences
title_fullStr Large scale hierarchical clustering of protein sequences
title_full_unstemmed Large scale hierarchical clustering of protein sequences
title_short Large scale hierarchical clustering of protein sequences
title_sort large scale hierarchical clustering of protein sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC547898/
https://www.ncbi.nlm.nih.gov/pubmed/15663796
http://dx.doi.org/10.1186/1471-2105-6-15
work_keys_str_mv AT krauseantje largescalehierarchicalclusteringofproteinsequences
AT stoyejens largescalehierarchicalclusteringofproteinsequences
AT vingronmartin largescalehierarchicalclusteringofproteinsequences