Cargando…
A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants
BACKGROUND: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. RESULTS: We report a...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098108/ https://www.ncbi.nlm.nih.gov/pubmed/20109199 http://dx.doi.org/10.1186/1471-2105-11-62 |
_version_ | 1782203919436349440 |
---|---|
author | Li, Yunqi Middaugh, C Russell Fang, Jianwen |
author_facet | Li, Yunqi Middaugh, C Russell Fang, Jianwen |
author_sort | Li, Yunqi |
collection | PubMed |
description | BACKGROUND: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. RESULTS: We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins. CONCLUSIONS: We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/. |
format | Text |
id | pubmed-3098108 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30981082011-05-20 A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants Li, Yunqi Middaugh, C Russell Fang, Jianwen BMC Bioinformatics Research Article BACKGROUND: The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. RESULTS: We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins. CONCLUSIONS: We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/. BioMed Central 2010-01-28 /pmc/articles/PMC3098108/ /pubmed/20109199 http://dx.doi.org/10.1186/1471-2105-11-62 Text en Copyright ©2010 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Li, Yunqi Middaugh, C Russell Fang, Jianwen A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title | A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title_full | A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title_fullStr | A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title_full_unstemmed | A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title_short | A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
title_sort | novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098108/ https://www.ncbi.nlm.nih.gov/pubmed/20109199 http://dx.doi.org/10.1186/1471-2105-11-62 |
work_keys_str_mv | AT liyunqi anovelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants AT middaughcrussell anovelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants AT fangjianwen anovelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants AT liyunqi novelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants AT middaughcrussell novelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants AT fangjianwen novelscoringfunctionfordiscriminatinghyperthermophilicandmesophilicproteinswithapplicationtopredictingrelativethermostabilityofproteinmutants |