Cargando…

Learning to predict expression efficacy of vectors in recombinant protein production

BACKGROUND: Recombinant protein production is a useful biotechnology to produce a large quantity of highly soluble proteins. Currently, the most widely used production system is to fuse a target protein into different vectors in Escherichia coli (E. coli). However, the production efficacy of differe...

Descripción completa

Detalles Bibliográficos
Autores principales: Chan, Wen-Ching, Liang, Po-Huang, Shih, Yan-Ping, Yang, Ueng-Cheng, Lin, Wen-chang, Hsu, Chun-Nan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009492/
https://www.ncbi.nlm.nih.gov/pubmed/20122193
http://dx.doi.org/10.1186/1471-2105-11-S1-S21
_version_ 1782194690877030400
author Chan, Wen-Ching
Liang, Po-Huang
Shih, Yan-Ping
Yang, Ueng-Cheng
Lin, Wen-chang
Hsu, Chun-Nan
author_facet Chan, Wen-Ching
Liang, Po-Huang
Shih, Yan-Ping
Yang, Ueng-Cheng
Lin, Wen-chang
Hsu, Chun-Nan
author_sort Chan, Wen-Ching
collection PubMed
description BACKGROUND: Recombinant protein production is a useful biotechnology to produce a large quantity of highly soluble proteins. Currently, the most widely used production system is to fuse a target protein into different vectors in Escherichia coli (E. coli). However, the production efficacy of different vectors varies for different target proteins. Trial-and-error is still the common practice to find out the efficacy of a vector for a given target protein. Previous studies are limited in that they assumed that proteins would be over-expressed and focused only on the solubility of expressed proteins. In fact, many pairings of vectors and proteins result in no expression. RESULTS: In this study, we applied machine learning to train prediction models to predict whether a pairing of vector-protein will express or not express in E. coli. For expressed cases, the models further predict whether the expressed proteins would be soluble. We collected a set of real cases from the clients of our recombinant protein production core facility, where six different vectors were designed and studied. This set of cases is used in both training and evaluation of our models. We evaluate three different models based on the support vector machines (SVM) and their ensembles. Unlike many previous works, these models consider the sequence of the target protein as well as the sequence of the whole fusion vector as the features. We show that a model that classifies a case into one of the three classes (no expression, inclusion body and soluble) outperforms a model that considers the nested structure of the three classes, while a model that can take advantage of the hierarchical structure of the three classes performs slight worse but comparably to the best model. Meanwhile, compared to previous works, we show that the prediction accuracy of our best method still performs the best. Lastly, we briefly present two methods to use the trained model in the design of the recombinant protein production systems to improve the chance of high soluble protein production. CONCLUSION: In this paper, we show that a machine learning approach to the prediction of the efficacy of a vector for a target protein in a recombinant protein production system is promising and may compliment traditional knowledge-driven study of the efficacy. We will release our program to share with other labs in the public domain when this paper is published.
format Text
id pubmed-3009492
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30094922010-12-23 Learning to predict expression efficacy of vectors in recombinant protein production Chan, Wen-Ching Liang, Po-Huang Shih, Yan-Ping Yang, Ueng-Cheng Lin, Wen-chang Hsu, Chun-Nan BMC Bioinformatics Research BACKGROUND: Recombinant protein production is a useful biotechnology to produce a large quantity of highly soluble proteins. Currently, the most widely used production system is to fuse a target protein into different vectors in Escherichia coli (E. coli). However, the production efficacy of different vectors varies for different target proteins. Trial-and-error is still the common practice to find out the efficacy of a vector for a given target protein. Previous studies are limited in that they assumed that proteins would be over-expressed and focused only on the solubility of expressed proteins. In fact, many pairings of vectors and proteins result in no expression. RESULTS: In this study, we applied machine learning to train prediction models to predict whether a pairing of vector-protein will express or not express in E. coli. For expressed cases, the models further predict whether the expressed proteins would be soluble. We collected a set of real cases from the clients of our recombinant protein production core facility, where six different vectors were designed and studied. This set of cases is used in both training and evaluation of our models. We evaluate three different models based on the support vector machines (SVM) and their ensembles. Unlike many previous works, these models consider the sequence of the target protein as well as the sequence of the whole fusion vector as the features. We show that a model that classifies a case into one of the three classes (no expression, inclusion body and soluble) outperforms a model that considers the nested structure of the three classes, while a model that can take advantage of the hierarchical structure of the three classes performs slight worse but comparably to the best model. Meanwhile, compared to previous works, we show that the prediction accuracy of our best method still performs the best. Lastly, we briefly present two methods to use the trained model in the design of the recombinant protein production systems to improve the chance of high soluble protein production. CONCLUSION: In this paper, we show that a machine learning approach to the prediction of the efficacy of a vector for a target protein in a recombinant protein production system is promising and may compliment traditional knowledge-driven study of the efficacy. We will release our program to share with other labs in the public domain when this paper is published. BioMed Central 2010-01-18 /pmc/articles/PMC3009492/ /pubmed/20122193 http://dx.doi.org/10.1186/1471-2105-11-S1-S21 Text en Copyright © 2010 Chan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chan, Wen-Ching
Liang, Po-Huang
Shih, Yan-Ping
Yang, Ueng-Cheng
Lin, Wen-chang
Hsu, Chun-Nan
Learning to predict expression efficacy of vectors in recombinant protein production
title Learning to predict expression efficacy of vectors in recombinant protein production
title_full Learning to predict expression efficacy of vectors in recombinant protein production
title_fullStr Learning to predict expression efficacy of vectors in recombinant protein production
title_full_unstemmed Learning to predict expression efficacy of vectors in recombinant protein production
title_short Learning to predict expression efficacy of vectors in recombinant protein production
title_sort learning to predict expression efficacy of vectors in recombinant protein production
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009492/
https://www.ncbi.nlm.nih.gov/pubmed/20122193
http://dx.doi.org/10.1186/1471-2105-11-S1-S21
work_keys_str_mv AT chanwenching learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction
AT liangpohuang learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction
AT shihyanping learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction
AT yanguengcheng learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction
AT linwenchang learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction
AT hsuchunnan learningtopredictexpressionefficacyofvectorsinrecombinantproteinproduction