Cargando…

Machine learning integration for predicting the effect of single amino acid substitutions on protein stability

BACKGROUND: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Özen, Ayşegül, Gönen, Mehmet, Alpaydın, Ethem, Haliloğlu, Türkan
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777163/ https://www.ncbi.nlm.nih.gov/pubmed/19840377 http://dx.doi.org/10.1186/1472-6807-9-66

_version_	1782174146889777152
author	Özen, Ayşegül Gönen, Mehmet Alpaydın, Ethem Haliloğlu, Türkan
author_facet	Özen, Ayşegül Gönen, Mehmet Alpaydın, Ethem Haliloğlu, Türkan
author_sort	Özen, Ayşegül
collection	PubMed
description	BACKGROUND: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. RESULTS: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. CONCLUSION: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at .
format	Text
id	pubmed-2777163
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27771632009-11-15 Machine learning integration for predicting the effect of single amino acid substitutions on protein stability Özen, Ayşegül Gönen, Mehmet Alpaydın, Ethem Haliloğlu, Türkan BMC Struct Biol Research Article BACKGROUND: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. RESULTS: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. CONCLUSION: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at . BioMed Central 2009-10-19 /pmc/articles/PMC2777163/ /pubmed/19840377 http://dx.doi.org/10.1186/1472-6807-9-66 Text en Copyright © 2009 Özen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Özen, Ayşegül Gönen, Mehmet Alpaydın, Ethem Haliloğlu, Türkan Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title	Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title_full	Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title_fullStr	Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title_full_unstemmed	Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title_short	Machine learning integration for predicting the effect of single amino acid substitutions on protein stability
title_sort	machine learning integration for predicting the effect of single amino acid substitutions on protein stability
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2777163/ https://www.ncbi.nlm.nih.gov/pubmed/19840377 http://dx.doi.org/10.1186/1472-6807-9-66
work_keys_str_mv	AT ozenaysegul machinelearningintegrationforpredictingtheeffectofsingleaminoacidsubstitutionsonproteinstability AT gonenmehmet machinelearningintegrationforpredictingtheeffectofsingleaminoacidsubstitutionsonproteinstability AT alpaydınethem machinelearningintegrationforpredictingtheeffectofsingleaminoacidsubstitutionsonproteinstability AT halilogluturkan machinelearningintegrationforpredictingtheeffectofsingleaminoacidsubstitutionsonproteinstability

Machine learning integration for predicting the effect of single amino acid substitutions on protein stability

Ejemplares similares