Cargando…
Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression
BACKGROUND: Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147021/ https://www.ncbi.nlm.nih.gov/pubmed/32272894 http://dx.doi.org/10.1186/s12859-020-3464-3 |
_version_ | 1783520335110340608 |
---|---|
author | Veevers, Ruth Cawley, Gavin Hayward, Steven |
author_facet | Veevers, Ruth Cawley, Gavin Hayward, Steven |
author_sort | Veevers, Ruth |
collection | PubMed |
description | BACKGROUND: Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. RESULTS: The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. CONCLUSION: In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk. |
format | Online Article Text |
id | pubmed-7147021 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71470212020-04-18 Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression Veevers, Ruth Cawley, Gavin Hayward, Steven BMC Bioinformatics Research Article BACKGROUND: Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. RESULTS: The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. CONCLUSION: In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk. BioMed Central 2020-04-09 /pmc/articles/PMC7147021/ /pubmed/32272894 http://dx.doi.org/10.1186/s12859-020-3464-3 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Veevers, Ruth Cawley, Gavin Hayward, Steven Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title | Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title_full | Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title_fullStr | Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title_full_unstemmed | Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title_short | Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
title_sort | investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147021/ https://www.ncbi.nlm.nih.gov/pubmed/32272894 http://dx.doi.org/10.1186/s12859-020-3464-3 |
work_keys_str_mv | AT veeversruth investigationofsequencefeaturesofhingebendingregionsinproteinswithdomainmovementsusingkernellogisticregression AT cawleygavin investigationofsequencefeaturesofhingebendingregionsinproteinswithdomainmovementsusingkernellogisticregression AT haywardsteven investigationofsequencefeaturesofhingebendingregionsinproteinswithdomainmovementsusingkernellogisticregression |