Cargando…

Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction

BACKGROUND: Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Compar...

Descripción completa

Detalles Bibliográficos
Autores principales: Bryant, Drew H, Moll, Mark, Chen, Brian Y, Fofanov, Viacheslav Y, Kavraki, Lydia E
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885373/
https://www.ncbi.nlm.nih.gov/pubmed/20459833
http://dx.doi.org/10.1186/1471-2105-11-242
_version_ 1782182380945014784
author Bryant, Drew H
Moll, Mark
Chen, Brian Y
Fofanov, Viacheslav Y
Kavraki, Lydia E
author_facet Bryant, Drew H
Moll, Mark
Chen, Brian Y
Fofanov, Viacheslav Y
Kavraki, Lydia E
author_sort Bryant, Drew H
collection PubMed
description BACKGROUND: Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. RESULTS: This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. CONCLUSIONS: FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data.
format Text
id pubmed-2885373
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28853732010-06-15 Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction Bryant, Drew H Moll, Mark Chen, Brian Y Fofanov, Viacheslav Y Kavraki, Lydia E BMC Bioinformatics Research article BACKGROUND: Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. RESULTS: This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. CONCLUSIONS: FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data. BioMed Central 2010-05-11 /pmc/articles/PMC2885373/ /pubmed/20459833 http://dx.doi.org/10.1186/1471-2105-11-242 Text en Copyright ©2010 Bryant et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Bryant, Drew H
Moll, Mark
Chen, Brian Y
Fofanov, Viacheslav Y
Kavraki, Lydia E
Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title_full Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title_fullStr Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title_full_unstemmed Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title_short Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
title_sort analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885373/
https://www.ncbi.nlm.nih.gov/pubmed/20459833
http://dx.doi.org/10.1186/1471-2105-11-242
work_keys_str_mv AT bryantdrewh analysisofsubstructuralvariationinfamiliesofenzymaticproteinswithapplicationstoproteinfunctionprediction
AT mollmark analysisofsubstructuralvariationinfamiliesofenzymaticproteinswithapplicationstoproteinfunctionprediction
AT chenbriany analysisofsubstructuralvariationinfamiliesofenzymaticproteinswithapplicationstoproteinfunctionprediction
AT fofanovviacheslavy analysisofsubstructuralvariationinfamiliesofenzymaticproteinswithapplicationstoproteinfunctionprediction
AT kavrakilydiae analysisofsubstructuralvariationinfamiliesofenzymaticproteinswithapplicationstoproteinfunctionprediction