Cargando…
Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns
BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and o...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318390/ https://www.ncbi.nlm.nih.gov/pubmed/25657820 http://dx.doi.org/10.1186/s13040-015-0038-4 |
_version_ | 1782355842559901696 |
---|---|
author | Meysman, Pieter Zhou, Cheng Cule, Boris Goethals, Bart Laukens, Kris |
author_facet | Meysman, Pieter Zhou, Cheng Cule, Boris Goethals, Bart Laukens, Kris |
author_sort | Meysman, Pieter |
collection | PubMed |
description | BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0038-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4318390 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43183902015-02-06 Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns Meysman, Pieter Zhou, Cheng Cule, Boris Goethals, Bart Laukens, Kris BioData Min Research BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0038-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-31 /pmc/articles/PMC4318390/ /pubmed/25657820 http://dx.doi.org/10.1186/s13040-015-0038-4 Text en © Meysman et al.; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Meysman, Pieter Zhou, Cheng Cule, Boris Goethals, Bart Laukens, Kris Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title | Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title_full | Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title_fullStr | Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title_full_unstemmed | Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title_short | Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns |
title_sort | mining the entire protein databank for frequent spatially cohesive amino acid patterns |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4318390/ https://www.ncbi.nlm.nih.gov/pubmed/25657820 http://dx.doi.org/10.1186/s13040-015-0038-4 |
work_keys_str_mv | AT meysmanpieter miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns AT zhoucheng miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns AT culeboris miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns AT goethalsbart miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns AT laukenskris miningtheentireproteindatabankforfrequentspatiallycohesiveaminoacidpatterns |