Cargando…

Super paramagnetic clustering of protein sequences

BACKGROUND: Detection of sequence homologues represents a challenging task that is important for the discovery of protein families and the reliable application of automatic annotation methods. The presence of domains in protein families of diverse function, inhomogeneity and different sizes of prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Tetko, Igor V, Facius, Axel, Ruepp, Andreas, Mewes, Hans-Werner
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1084344/
https://www.ncbi.nlm.nih.gov/pubmed/15804359
http://dx.doi.org/10.1186/1471-2105-6-82
_version_ 1782123797259747328
author Tetko, Igor V
Facius, Axel
Ruepp, Andreas
Mewes, Hans-Werner
author_facet Tetko, Igor V
Facius, Axel
Ruepp, Andreas
Mewes, Hans-Werner
author_sort Tetko, Igor V
collection PubMed
description BACKGROUND: Detection of sequence homologues represents a challenging task that is important for the discovery of protein families and the reliable application of automatic annotation methods. The presence of domains in protein families of diverse function, inhomogeneity and different sizes of protein families create considerable difficulties for the application of published clustering methods. RESULTS: Our work analyses the Super Paramagnetic Clustering (SPC) and its extension, global SPC (gSPC) algorithm. These algorithms cluster input data based on a method that is analogous to the treatment of an inhomogeneous ferromagnet in physics. For the SwissProt and SCOP databases we show that the gSPC improves the specificity and sensitivity of clustering over the original SPC and Markov Cluster algorithm (TRIBE-MCL) up to 30%. The three algorithms provided similar results for the MIPS FunCat 1.3 annotation of four bacterial genomes, Bacillus subtilis, Helicobacter pylori, Listeria innocua and Listeria monocytogenes. However, the gSPC covered about 12% more sequences compared to the other methods. The SPC algorithm was programmed in house using C++ and it is available at . The FunCat annotation is available at . CONCLUSION: The gSPC calculated to a higher accuracy or covered a larger number of sequences than the TRIBE-MCL algorithm. Thus it is a useful approach for automatic detection of protein families and unsupervised annotation of full genomes.
format Text
id pubmed-1084344
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-10843442005-04-23 Super paramagnetic clustering of protein sequences Tetko, Igor V Facius, Axel Ruepp, Andreas Mewes, Hans-Werner BMC Bioinformatics Methodology Article BACKGROUND: Detection of sequence homologues represents a challenging task that is important for the discovery of protein families and the reliable application of automatic annotation methods. The presence of domains in protein families of diverse function, inhomogeneity and different sizes of protein families create considerable difficulties for the application of published clustering methods. RESULTS: Our work analyses the Super Paramagnetic Clustering (SPC) and its extension, global SPC (gSPC) algorithm. These algorithms cluster input data based on a method that is analogous to the treatment of an inhomogeneous ferromagnet in physics. For the SwissProt and SCOP databases we show that the gSPC improves the specificity and sensitivity of clustering over the original SPC and Markov Cluster algorithm (TRIBE-MCL) up to 30%. The three algorithms provided similar results for the MIPS FunCat 1.3 annotation of four bacterial genomes, Bacillus subtilis, Helicobacter pylori, Listeria innocua and Listeria monocytogenes. However, the gSPC covered about 12% more sequences compared to the other methods. The SPC algorithm was programmed in house using C++ and it is available at . The FunCat annotation is available at . CONCLUSION: The gSPC calculated to a higher accuracy or covered a larger number of sequences than the TRIBE-MCL algorithm. Thus it is a useful approach for automatic detection of protein families and unsupervised annotation of full genomes. BioMed Central 2005-04-01 /pmc/articles/PMC1084344/ /pubmed/15804359 http://dx.doi.org/10.1186/1471-2105-6-82 Text en Copyright © 2005 Tetko et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Tetko, Igor V
Facius, Axel
Ruepp, Andreas
Mewes, Hans-Werner
Super paramagnetic clustering of protein sequences
title Super paramagnetic clustering of protein sequences
title_full Super paramagnetic clustering of protein sequences
title_fullStr Super paramagnetic clustering of protein sequences
title_full_unstemmed Super paramagnetic clustering of protein sequences
title_short Super paramagnetic clustering of protein sequences
title_sort super paramagnetic clustering of protein sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1084344/
https://www.ncbi.nlm.nih.gov/pubmed/15804359
http://dx.doi.org/10.1186/1471-2105-6-82
work_keys_str_mv AT tetkoigorv superparamagneticclusteringofproteinsequences
AT faciusaxel superparamagneticclusteringofproteinsequences
AT rueppandreas superparamagneticclusteringofproteinsequences
AT meweshanswerner superparamagneticclusteringofproteinsequences