Cargando…

Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins

BACKGROUND: Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Folkman, Lukas, Stantic, Bela, Sattar, Abdul
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046685/ https://www.ncbi.nlm.nih.gov/pubmed/24564514 http://dx.doi.org/10.1186/1471-2164-15-S1-S4

_version_	1782480298891542528
author	Folkman, Lukas Stantic, Bela Sattar, Abdul
author_facet	Folkman, Lukas Stantic, Bela Sattar, Abdul
author_sort	Folkman, Lukas
collection	PubMed
description	BACKGROUND: Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set. RESULTS: We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51. CONCLUSIONS: Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4046685
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40466852014-06-06 Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins Folkman, Lukas Stantic, Bela Sattar, Abdul BMC Genomics Proceedings BACKGROUND: Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set. RESULTS: We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51. CONCLUSIONS: Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users. BioMed Central 2014-01-24 /pmc/articles/PMC4046685/ /pubmed/24564514 http://dx.doi.org/10.1186/1471-2164-15-S1-S4 Text en © Folkman et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Folkman, Lukas Stantic, Bela Sattar, Abdul Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title	Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title_full	Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title_fullStr	Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title_full_unstemmed	Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title_short	Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
title_sort	towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046685/ https://www.ncbi.nlm.nih.gov/pubmed/24564514 http://dx.doi.org/10.1186/1471-2164-15-S1-S4
work_keys_str_mv	AT folkmanlukas towardssequencebasedpredictionofmutationinducedstabilitychangesinunseennonhomologousproteins AT stanticbela towardssequencebasedpredictionofmutationinducedstabilitychangesinunseennonhomologousproteins AT sattarabdul towardssequencebasedpredictionofmutationinducedstabilitychangesinunseennonhomologousproteins

Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins

Ejemplares similares