Cargando…

SoluProt: prediction of soluble protein expression in Escherichia coli

MOTIVATION: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Esc...

Descripción completa

Detalles Bibliográficos
Autores principales: Hon, Jiri, Marusiak, Martin, Martinek, Tomas, Kunka, Antonin, Zendulka, Jaroslav, Bednar, David, Damborsky, Jiri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034534/
https://www.ncbi.nlm.nih.gov/pubmed/33416864
http://dx.doi.org/10.1093/bioinformatics/btaa1102
_version_ 1783676559052242944
author Hon, Jiri
Marusiak, Martin
Martinek, Tomas
Kunka, Antonin
Zendulka, Jaroslav
Bednar, David
Damborsky, Jiri
author_facet Hon, Jiri
Marusiak, Martin
Martinek, Tomas
Kunka, Antonin
Zendulka, Jaroslav
Bednar, David
Damborsky, Jiri
author_sort Hon, Jiri
collection PubMed
description MOTIVATION: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. RESULTS: A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt’s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/. AVAILABILITY AND IMPLEMENTATION: https://loschmidt.chemi.muni.cz/soluprot/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8034534
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80345342021-04-14 SoluProt: prediction of soluble protein expression in Escherichia coli Hon, Jiri Marusiak, Martin Martinek, Tomas Kunka, Antonin Zendulka, Jaroslav Bednar, David Damborsky, Jiri Bioinformatics Original Papers MOTIVATION: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. RESULTS: A new tool for sequence-based prediction of soluble protein expression in E.coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt’s accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/. AVAILABILITY AND IMPLEMENTATION: https://loschmidt.chemi.muni.cz/soluprot/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-01-08 /pmc/articles/PMC8034534/ /pubmed/33416864 http://dx.doi.org/10.1093/bioinformatics/btaa1102 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Hon, Jiri
Marusiak, Martin
Martinek, Tomas
Kunka, Antonin
Zendulka, Jaroslav
Bednar, David
Damborsky, Jiri
SoluProt: prediction of soluble protein expression in Escherichia coli
title SoluProt: prediction of soluble protein expression in Escherichia coli
title_full SoluProt: prediction of soluble protein expression in Escherichia coli
title_fullStr SoluProt: prediction of soluble protein expression in Escherichia coli
title_full_unstemmed SoluProt: prediction of soluble protein expression in Escherichia coli
title_short SoluProt: prediction of soluble protein expression in Escherichia coli
title_sort soluprot: prediction of soluble protein expression in escherichia coli
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8034534/
https://www.ncbi.nlm.nih.gov/pubmed/33416864
http://dx.doi.org/10.1093/bioinformatics/btaa1102
work_keys_str_mv AT honjiri soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT marusiakmartin soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT martinektomas soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT kunkaantonin soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT zendulkajaroslav soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT bednardavid soluprotpredictionofsolubleproteinexpressioninescherichiacoli
AT damborskyjiri soluprotpredictionofsolubleproteinexpressioninescherichiacoli