Cargando…

Clustering the annotation space of proteins

BACKGROUND: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. RESULTS: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. Thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Kunin, Victor, Ouzounis, Christos A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC552314/
https://www.ncbi.nlm.nih.gov/pubmed/15703069
http://dx.doi.org/10.1186/1471-2105-6-24
_version_ 1782122479535259648
author Kunin, Victor
Ouzounis, Christos A
author_facet Kunin, Victor
Ouzounis, Christos A
author_sort Kunin, Victor
collection PubMed
description BACKGROUND: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. RESULTS: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at CONCLUSIONS: CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels.
format Text
id pubmed-552314
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5523142005-03-06 Clustering the annotation space of proteins Kunin, Victor Ouzounis, Christos A BMC Bioinformatics Research Article BACKGROUND: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. RESULTS: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at CONCLUSIONS: CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels. BioMed Central 2005-02-09 /pmc/articles/PMC552314/ /pubmed/15703069 http://dx.doi.org/10.1186/1471-2105-6-24 Text en Copyright © 2005 Kunin and Ouzounis; licensee BioMed Central Ltd.
spellingShingle Research Article
Kunin, Victor
Ouzounis, Christos A
Clustering the annotation space of proteins
title Clustering the annotation space of proteins
title_full Clustering the annotation space of proteins
title_fullStr Clustering the annotation space of proteins
title_full_unstemmed Clustering the annotation space of proteins
title_short Clustering the annotation space of proteins
title_sort clustering the annotation space of proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC552314/
https://www.ncbi.nlm.nih.gov/pubmed/15703069
http://dx.doi.org/10.1186/1471-2105-6-24
work_keys_str_mv AT kuninvictor clusteringtheannotationspaceofproteins
AT ouzounischristosa clusteringtheannotationspaceofproteins