Cargando…

Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

BACKGROUND: A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have t...

Descripción completa

Detalles Bibliográficos
Autores principales: Tobi, Dror, Bahar, Ivet
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950313/
https://www.ncbi.nlm.nih.gov/pubmed/17598909
http://dx.doi.org/10.1186/1471-2105-8-226
Descripción
Sumario:BACKGROUND: A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 20(3 )different types of 3-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare 3-grams distinguished by their high entropy-based information content. RESULTS: Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit 3-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results. CONCLUSION: The results suggest that recruitment of rare 3-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.