Cargando…
Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map
Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869490/ https://www.ncbi.nlm.nih.gov/pubmed/33557952 http://dx.doi.org/10.1186/s13321-021-00488-1 |
_version_ | 1783648640606142464 |
---|---|
author | Chen, Jianwen Zheng, Shuangjia Zhao, Huiying Yang, Yuedong |
author_facet | Chen, Jianwen Zheng, Shuangjia Zhao, Huiying Yang, Yuedong |
author_sort | Chen, Jianwen |
collection | PubMed |
description | Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence. [Image: see text] |
format | Online Article Text |
id | pubmed-7869490 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-78694902021-02-08 Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map Chen, Jianwen Zheng, Shuangjia Zhao, Huiying Yang, Yuedong J Cheminform Research Article Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence. [Image: see text] Springer International Publishing 2021-02-08 /pmc/articles/PMC7869490/ /pubmed/33557952 http://dx.doi.org/10.1186/s13321-021-00488-1 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Chen, Jianwen Zheng, Shuangjia Zhao, Huiying Yang, Yuedong Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title | Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title_full | Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title_fullStr | Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title_full_unstemmed | Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title_short | Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
title_sort | structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869490/ https://www.ncbi.nlm.nih.gov/pubmed/33557952 http://dx.doi.org/10.1186/s13321-021-00488-1 |
work_keys_str_mv | AT chenjianwen structureawareproteinsolubilitypredictionfromsequencethroughgraphconvolutionalnetworkandpredictedcontactmap AT zhengshuangjia structureawareproteinsolubilitypredictionfromsequencethroughgraphconvolutionalnetworkandpredictedcontactmap AT zhaohuiying structureawareproteinsolubilitypredictionfromsequencethroughgraphconvolutionalnetworkandpredictedcontactmap AT yangyuedong structureawareproteinsolubilitypredictionfromsequencethroughgraphconvolutionalnetworkandpredictedcontactmap |