Cargando…

Predicting active site residue annotations in the Pfam database

BACKGROUND: Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mistry, Jaina, Bateman, Alex, Finn, Robert D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2025603/
https://www.ncbi.nlm.nih.gov/pubmed/17688688
http://dx.doi.org/10.1186/1471-2105-8-298
_version_ 1782136795715076096
author Mistry, Jaina
Bateman, Alex
Finn, Robert D
author_facet Mistry, Jaina
Bateman, Alex
Finn, Robert D
author_sort Mistry, Jaina
collection PubMed
description BACKGROUND: Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family. DESCRIPTION: We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and MEROPS we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives. CONCLUSION: We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation.
format Text
id pubmed-2025603
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20256032007-10-16 Predicting active site residue annotations in the Pfam database Mistry, Jaina Bateman, Alex Finn, Robert D BMC Bioinformatics Database BACKGROUND: Approximately 5% of Pfam families are enzymatic, but only a small fraction of the sequences within these families (<0.5%) have had the residues responsible for catalysis determined. To increase the active site annotations in the Pfam database, we have developed a strict set of rules, chosen to reduce the rate of false positives, which enable the transfer of experimentally determined active site residue data to other sequences within the same Pfam family. DESCRIPTION: We have created a large database of predicted active site residues. On comparing our active site predictions to those found in UniProtKB, Catalytic Site Atlas, PROSITE and MEROPS we find that we make many novel predictions. On investigating the small subset of predictions made by these databases that are not predicted by us, we found these sequences did not meet our strict criteria for prediction. We assessed the sensitivity and specificity of our methodology and estimate that only 3% of our predicted sequences are false positives. CONCLUSION: We have predicted 606110 active site residues, of which 94% are not found in UniProtKB, and have increased the active site annotations in Pfam by more than 200 fold. Although implemented for Pfam, the tool we have developed for transferring the data can be applied to any alignment with associated experimental active site data and is available for download. Our active site predictions are re-calculated at each Pfam release to ensure they are comprehensive and up to date. They provide one of the largest available databases of active site annotation. BioMed Central 2007-08-09 /pmc/articles/PMC2025603/ /pubmed/17688688 http://dx.doi.org/10.1186/1471-2105-8-298 Text en Copyright © 2007 Mistry et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database
Mistry, Jaina
Bateman, Alex
Finn, Robert D
Predicting active site residue annotations in the Pfam database
title Predicting active site residue annotations in the Pfam database
title_full Predicting active site residue annotations in the Pfam database
title_fullStr Predicting active site residue annotations in the Pfam database
title_full_unstemmed Predicting active site residue annotations in the Pfam database
title_short Predicting active site residue annotations in the Pfam database
title_sort predicting active site residue annotations in the pfam database
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2025603/
https://www.ncbi.nlm.nih.gov/pubmed/17688688
http://dx.doi.org/10.1186/1471-2105-8-298
work_keys_str_mv AT mistryjaina predictingactivesiteresidueannotationsinthepfamdatabase
AT batemanalex predictingactivesiteresidueannotationsinthepfamdatabase
AT finnrobertd predictingactivesiteresidueannotationsinthepfamdatabase