Cargando…

A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli

BACKGROUND: Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been establi...

Descripción completa

Detalles Bibliográficos
Autores principales: Habibi, Narjeskhatoon, Mohd Hashim, Siti Z, Norouzi, Alireza, Samian, Mohammed Razip
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098780/
https://www.ncbi.nlm.nih.gov/pubmed/24885721
http://dx.doi.org/10.1186/1471-2105-15-134
_version_ 1782326394498318336
author Habibi, Narjeskhatoon
Mohd Hashim, Siti Z
Norouzi, Alireza
Samian, Mohammed Razip
author_facet Habibi, Narjeskhatoon
Mohd Hashim, Siti Z
Norouzi, Alireza
Samian, Mohammed Razip
author_sort Habibi, Narjeskhatoon
collection PubMed
description BACKGROUND: Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. RESULTS: This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. CONCLUSIONS: This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.
format Online
Article
Text
id pubmed-4098780
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40987802014-07-16 A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli Habibi, Narjeskhatoon Mohd Hashim, Siti Z Norouzi, Alireza Samian, Mohammed Razip BMC Bioinformatics Research Article BACKGROUND: Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods. RESULTS: This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. CONCLUSIONS: This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost. BioMed Central 2014-05-08 /pmc/articles/PMC4098780/ /pubmed/24885721 http://dx.doi.org/10.1186/1471-2105-15-134 Text en Copyright © 2014 Habibi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Habibi, Narjeskhatoon
Mohd Hashim, Siti Z
Norouzi, Alireza
Samian, Mohammed Razip
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title_full A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title_fullStr A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title_full_unstemmed A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title_short A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
title_sort review of machine learning methods to predict the solubility of overexpressed recombinant proteins in escherichia coli
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098780/
https://www.ncbi.nlm.nih.gov/pubmed/24885721
http://dx.doi.org/10.1186/1471-2105-15-134
work_keys_str_mv AT habibinarjeskhatoon areviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT mohdhashimsitiz areviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT norouzialireza areviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT samianmohammedrazip areviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT habibinarjeskhatoon reviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT mohdhashimsitiz reviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT norouzialireza reviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli
AT samianmohammedrazip reviewofmachinelearningmethodstopredictthesolubilityofoverexpressedrecombinantproteinsinescherichiacoli