Cargando…

Scoring function to predict solubility mutagenesis

BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, Ye, Deutsch, Christopher, Krishnamoorthy, Bala
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958853/
https://www.ncbi.nlm.nih.gov/pubmed/20929563
http://dx.doi.org/10.1186/1748-7188-5-33
_version_ 1782188382727700480
author Tian, Ye
Deutsch, Christopher
Krishnamoorthy, Bala
author_facet Tian, Ye
Deutsch, Christopher
Krishnamoorthy, Bala
author_sort Tian, Ye
collection PubMed
description BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
format Text
id pubmed-2958853
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29588532010-10-25 Scoring function to predict solubility mutagenesis Tian, Ye Deutsch, Christopher Krishnamoorthy, Bala Algorithms Mol Biol Research BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html. BioMed Central 2010-10-07 /pmc/articles/PMC2958853/ /pubmed/20929563 http://dx.doi.org/10.1186/1748-7188-5-33 Text en Copyright ©2010 Tian et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Tian, Ye
Deutsch, Christopher
Krishnamoorthy, Bala
Scoring function to predict solubility mutagenesis
title Scoring function to predict solubility mutagenesis
title_full Scoring function to predict solubility mutagenesis
title_fullStr Scoring function to predict solubility mutagenesis
title_full_unstemmed Scoring function to predict solubility mutagenesis
title_short Scoring function to predict solubility mutagenesis
title_sort scoring function to predict solubility mutagenesis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958853/
https://www.ncbi.nlm.nih.gov/pubmed/20929563
http://dx.doi.org/10.1186/1748-7188-5-33
work_keys_str_mv AT tianye scoringfunctiontopredictsolubilitymutagenesis
AT deutschchristopher scoringfunctiontopredictsolubilitymutagenesis
AT krishnamoorthybala scoringfunctiontopredictsolubilitymutagenesis