Cargando…

Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns

BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and o...

Descripción completa

Detalles Bibliográficos
Autores principales: Meysman, Pieter, Zhou, Cheng, Cule, Boris, Goethals, Bart, Laukens, Kris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318390/
https://www.ncbi.nlm.nih.gov/pubmed/25657820
http://dx.doi.org/10.1186/s13040-015-0038-4
_version_ 1782355842559901696
author Meysman, Pieter
Zhou, Cheng
Cule, Boris
Goethals, Bart
Laukens, Kris
author_facet Meysman, Pieter
Zhou, Cheng
Cule, Boris
Goethals, Bart
Laukens, Kris
author_sort Meysman, Pieter
collection PubMed
description BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0038-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4318390
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43183902015-02-06 Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns Meysman, Pieter Zhou, Cheng Cule, Boris Goethals, Bart Laukens, Kris BioData Min Research BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0038-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-31 /pmc/articles/PMC4318390/ /pubmed/25657820 http://dx.doi.org/10.1186/s13040-015-0038-4 Text en © Meysman et al.; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Meysman, Pieter
Zhou, Cheng
Cule, Boris
Goethals, Bart
Laukens, Kris
Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title_full Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title_fullStr Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title_full_unstemmed Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title_short Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
title_sort mining the entire protein databank for frequent spatially cohesive amino acid patterns
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318390/
https://www.ncbi.nlm.nih.gov/pubmed/25657820
http://dx.doi.org/10.1186/s13040-015-0038-4
work_keys_str_mv AT meysmanpieter miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns
AT zhoucheng miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns
AT culeboris miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns
AT goethalsbart miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns
AT laukenskris miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns