Cargando…

SUPFAM: A database of sequence superfamilies of protein domains

BACKGROUND: SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandit, Shashi B, Bhadra, Rana, Gowri, VS, Balaji, S, Anand, B, Srinivasan, N
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC394316/
https://www.ncbi.nlm.nih.gov/pubmed/15113407
http://dx.doi.org/10.1186/1471-2105-5-28
_version_ 1782121311213977600
author Pandit, Shashi B
Bhadra, Rana
Gowri, VS
Balaji, S
Anand, B
Srinivasan, N
author_facet Pandit, Shashi B
Bhadra, Rana
Gowri, VS
Balaji, S
Anand, B
Srinivasan, N
author_sort Pandit, Shashi B
collection PubMed
description BACKGROUND: SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure. DESCRIPTION: The current version of SUPFAM (release 1.4) corresponds to significant enhancements and major developments compared to the earlier and basic version. In the present version we have used RPS-BLAST, which is robust and sensitive, for profile matching. The reliability of connections between protein families is ensured better than before by use of benchmarked criteria involving strict e-value cut-off and a minimal alignment length condition. An e-value based indication of reliability of connections is now presented in the database. Web access to a RPS-BLAST-based tool to associate a query sequence to one of the family profiles in SUPFAM is available with the current release. In terms of the scientific content the present release of SUPFAM is entirely reorganized with the use of 6190 Pfam families and 2317 structural families derived from SCOP. Due to a steep increase in the number of sequence and structural families used in SUPFAM the details of scientific content in the present release are almost entirely complementary to previous basic version. Of the 2286 families, we could relate 245 Pfam families with apparently no structural information to families of known 3-D structures, thus resulting in the identification of new families in the existing superfamilies. Using the profiles of 3904 Pfam families of yet unknown structure, an all-against-all comparison involving sequence-profile match resulted in clustering of 96 Pfam families into 39 new potential superfamilies. CONCLUSION: SUPFAM presents many non-trivial superfamily relationships of sequence families involved in a variety of functions and hence the information content is of interest to a wide scientific community. The grouping of related proteins without a known structure in SUPFAM is useful in identifying priority targets for structural genomics initiatives and in the assignment of putative functions. Database URL: .
format Text
id pubmed-394316
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-3943162004-04-22 SUPFAM: A database of sequence superfamilies of protein domains Pandit, Shashi B Bhadra, Rana Gowri, VS Balaji, S Anand, B Srinivasan, N BMC Bioinformatics Database BACKGROUND: SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure. DESCRIPTION: The current version of SUPFAM (release 1.4) corresponds to significant enhancements and major developments compared to the earlier and basic version. In the present version we have used RPS-BLAST, which is robust and sensitive, for profile matching. The reliability of connections between protein families is ensured better than before by use of benchmarked criteria involving strict e-value cut-off and a minimal alignment length condition. An e-value based indication of reliability of connections is now presented in the database. Web access to a RPS-BLAST-based tool to associate a query sequence to one of the family profiles in SUPFAM is available with the current release. In terms of the scientific content the present release of SUPFAM is entirely reorganized with the use of 6190 Pfam families and 2317 structural families derived from SCOP. Due to a steep increase in the number of sequence and structural families used in SUPFAM the details of scientific content in the present release are almost entirely complementary to previous basic version. Of the 2286 families, we could relate 245 Pfam families with apparently no structural information to families of known 3-D structures, thus resulting in the identification of new families in the existing superfamilies. Using the profiles of 3904 Pfam families of yet unknown structure, an all-against-all comparison involving sequence-profile match resulted in clustering of 96 Pfam families into 39 new potential superfamilies. CONCLUSION: SUPFAM presents many non-trivial superfamily relationships of sequence families involved in a variety of functions and hence the information content is of interest to a wide scientific community. The grouping of related proteins without a known structure in SUPFAM is useful in identifying priority targets for structural genomics initiatives and in the assignment of putative functions. Database URL: . BioMed Central 2004-03-15 /pmc/articles/PMC394316/ /pubmed/15113407 http://dx.doi.org/10.1186/1471-2105-5-28 Text en Copyright © 2004 Pandit et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Database
Pandit, Shashi B
Bhadra, Rana
Gowri, VS
Balaji, S
Anand, B
Srinivasan, N
SUPFAM: A database of sequence superfamilies of protein domains
title SUPFAM: A database of sequence superfamilies of protein domains
title_full SUPFAM: A database of sequence superfamilies of protein domains
title_fullStr SUPFAM: A database of sequence superfamilies of protein domains
title_full_unstemmed SUPFAM: A database of sequence superfamilies of protein domains
title_short SUPFAM: A database of sequence superfamilies of protein domains
title_sort supfam: a database of sequence superfamilies of protein domains
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC394316/
https://www.ncbi.nlm.nih.gov/pubmed/15113407
http://dx.doi.org/10.1186/1471-2105-5-28
work_keys_str_mv AT panditshashib supfamadatabaseofsequencesuperfamiliesofproteindomains
AT bhadrarana supfamadatabaseofsequencesuperfamiliesofproteindomains
AT gowrivs supfamadatabaseofsequencesuperfamiliesofproteindomains
AT balajis supfamadatabaseofsequencesuperfamiliesofproteindomains
AT anandb supfamadatabaseofsequencesuperfamiliesofproteindomains
AT srinivasann supfamadatabaseofsequencesuperfamiliesofproteindomains