Cargando…
DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limite...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681957/ https://www.ncbi.nlm.nih.gov/pubmed/38034401 http://dx.doi.org/10.1016/j.csbj.2023.11.006 |
_version_ | 1785150873857425408 |
---|---|
author | Li, Mengyu Wang, Hongzhao Yang, Zhenwu Zhang, Longgui Zhu, Yushan |
author_facet | Li, Mengyu Wang, Hongzhao Yang, Zhenwu Zhang, Longgui Zhu, Yushan |
author_sort | Li, Mengyu |
collection | PubMed |
description | Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limited availability of experimentally determined T(m) data and the insufficient accuracy of existing computational methods in predicting T(m), there is an urgent need for a computational approach to accurately forecast the T(m) values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the T(m) values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R(2)) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the T(m) values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering. |
format | Online Article Text |
id | pubmed-10681957 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-106819572023-11-30 DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences Li, Mengyu Wang, Hongzhao Yang, Zhenwu Zhang, Longgui Zhu, Yushan Comput Struct Biotechnol J Research Article Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limited availability of experimentally determined T(m) data and the insufficient accuracy of existing computational methods in predicting T(m), there is an urgent need for a computational approach to accurately forecast the T(m) values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the T(m) values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R(2)) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the T(m) values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering. Research Network of Computational and Structural Biotechnology 2023-11-04 /pmc/articles/PMC10681957/ /pubmed/38034401 http://dx.doi.org/10.1016/j.csbj.2023.11.006 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Li, Mengyu Wang, Hongzhao Yang, Zhenwu Zhang, Longgui Zhu, Yushan DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title | DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title_full | DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title_fullStr | DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title_full_unstemmed | DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title_short | DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
title_sort | deeptm: a deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681957/ https://www.ncbi.nlm.nih.gov/pubmed/38034401 http://dx.doi.org/10.1016/j.csbj.2023.11.006 |
work_keys_str_mv | AT limengyu deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences AT wanghongzhao deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences AT yangzhenwu deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences AT zhanglonggui deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences AT zhuyushan deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences |