Cargando…

Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE

BACKGROUND: Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and in...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Chao, Zou, Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875434/
https://www.ncbi.nlm.nih.gov/pubmed/36694239
http://dx.doi.org/10.1186/s12915-023-01510-8
_version_ 1784877960094810112
author Wang, Chao
Zou, Quan
author_facet Wang, Chao
Zou, Quan
author_sort Wang, Chao
collection PubMed
description BACKGROUND: Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work. RESULTS: In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects. CONCLUSIONS: DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at http://lab.malab.cn/~wangchao/softs/DeepSoluE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12915-023-01510-8.
format Online
Article
Text
id pubmed-9875434
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-98754342023-01-26 Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE Wang, Chao Zou, Quan BMC Biol Methodology Article BACKGROUND: Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work. RESULTS: In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects. CONCLUSIONS: DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at http://lab.malab.cn/~wangchao/softs/DeepSoluE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12915-023-01510-8. BioMed Central 2023-01-24 /pmc/articles/PMC9875434/ /pubmed/36694239 http://dx.doi.org/10.1186/s12915-023-01510-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Wang, Chao
Zou, Quan
Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title_full Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title_fullStr Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title_full_unstemmed Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title_short Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE
title_sort prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with deepsolue
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875434/
https://www.ncbi.nlm.nih.gov/pubmed/36694239
http://dx.doi.org/10.1186/s12915-023-01510-8
work_keys_str_mv AT wangchao predictionofproteinsolubilitybasedonsequencephysicochemicalpatternsanddistributedrepresentationinformationwithdeepsolue
AT zouquan predictionofproteinsolubilitybasedonsequencephysicochemicalpatternsanddistributedrepresentationinformationwithdeepsolue