Cargando…
DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks
Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704505/ https://www.ncbi.nlm.nih.gov/pubmed/34948354 http://dx.doi.org/10.3390/ijms222413555 |
_version_ | 1784621722717126656 |
---|---|
author | Madani, Mohammad Lin, Kaixiang Tarakanova, Anna |
author_facet | Madani, Mohammad Lin, Kaixiang Tarakanova, Anna |
author_sort | Madani, Mohammad |
collection | PubMed |
description | Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design. |
format | Online Article Text |
id | pubmed-8704505 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-87045052021-12-25 DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks Madani, Mohammad Lin, Kaixiang Tarakanova, Anna Int J Mol Sci Article Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design. MDPI 2021-12-17 /pmc/articles/PMC8704505/ /pubmed/34948354 http://dx.doi.org/10.3390/ijms222413555 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Madani, Mohammad Lin, Kaixiang Tarakanova, Anna DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title | DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title_full | DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title_fullStr | DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title_full_unstemmed | DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title_short | DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks |
title_sort | dsressol: a sequence-based solubility predictor created with dilated squeeze excitation residual networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8704505/ https://www.ncbi.nlm.nih.gov/pubmed/34948354 http://dx.doi.org/10.3390/ijms222413555 |
work_keys_str_mv | AT madanimohammad dsressolasequencebasedsolubilitypredictorcreatedwithdilatedsqueezeexcitationresidualnetworks AT linkaixiang dsressolasequencebasedsolubilitypredictorcreatedwithdilatedsqueezeexcitationresidualnetworks AT tarakanovaanna dsressolasequencebasedsolubilitypredictorcreatedwithdilatedsqueezeexcitationresidualnetworks |