Cargando…

Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

BACKGROUND: Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently...

Descripción completa

Detalles Bibliográficos
Autores principales: Durston, Kirk K, Chiu, David KY, Wong, Andrew KC, Li, Gary CL
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524763/
https://www.ncbi.nlm.nih.gov/pubmed/22793672
http://dx.doi.org/10.1186/1687-4153-2012-8
_version_ 1782253363560185856
author Durston, Kirk K
Chiu, David KY
Wong, Andrew KC
Li, Gary CL
author_facet Durston, Kirk K
Chiu, David KY
Wong, Andrew KC
Li, Gary CL
author_sort Durston, Kirk K
collection PubMed
description BACKGROUND: Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. RESULTS: The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. CONCLUSIONS: Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
format Online
Article
Text
id pubmed-3524763
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35247632013-01-08 Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring Durston, Kirk K Chiu, David KY Wong, Andrew KC Li, Gary CL EURASIP J Bioinform Syst Biol Research BACKGROUND: Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. RESULTS: The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. CONCLUSIONS: Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. BioMed Central 2012 2012-07-13 /pmc/articles/PMC3524763/ /pubmed/22793672 http://dx.doi.org/10.1186/1687-4153-2012-8 Text en Copyright ©2012 Durston et al.; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Durston, Kirk K
Chiu, David KY
Wong, Andrew KC
Li, Gary CL
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title_full Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title_fullStr Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title_full_unstemmed Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title_short Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
title_sort statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524763/
https://www.ncbi.nlm.nih.gov/pubmed/22793672
http://dx.doi.org/10.1186/1687-4153-2012-8
work_keys_str_mv AT durstonkirkk statisticaldiscoveryofsiteinterdependenciesinsubmolecularhierarchicalproteinstructuring
AT chiudavidky statisticaldiscoveryofsiteinterdependenciesinsubmolecularhierarchicalproteinstructuring
AT wongandrewkc statisticaldiscoveryofsiteinterdependenciesinsubmolecularhierarchicalproteinstructuring
AT ligarycl statisticaldiscoveryofsiteinterdependenciesinsubmolecularhierarchicalproteinstructuring