Cargando…

DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences

Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limite...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Mengyu, Wang, Hongzhao, Yang, Zhenwu, Zhang, Longgui, Zhu, Yushan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681957/
https://www.ncbi.nlm.nih.gov/pubmed/38034401
http://dx.doi.org/10.1016/j.csbj.2023.11.006
_version_ 1785150873857425408
author Li, Mengyu
Wang, Hongzhao
Yang, Zhenwu
Zhang, Longgui
Zhu, Yushan
author_facet Li, Mengyu
Wang, Hongzhao
Yang, Zhenwu
Zhang, Longgui
Zhu, Yushan
author_sort Li, Mengyu
collection PubMed
description Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limited availability of experimentally determined T(m) data and the insufficient accuracy of existing computational methods in predicting T(m), there is an urgent need for a computational approach to accurately forecast the T(m) values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the T(m) values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R(2)) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the T(m) values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering.
format Online
Article
Text
id pubmed-10681957
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-106819572023-11-30 DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences Li, Mengyu Wang, Hongzhao Yang, Zhenwu Zhang, Longgui Zhu, Yushan Comput Struct Biotechnol J Research Article Thermally stable proteins find extensive applications in industrial production, pharmaceutical development, and serve as a highly evolved starting point in protein engineering. The thermal stability of proteins is commonly characterized by their melting temperature (T(m)). However, due to the limited availability of experimentally determined T(m) data and the insufficient accuracy of existing computational methods in predicting T(m), there is an urgent need for a computational approach to accurately forecast the T(m) values of thermophilic proteins. Here, we present a deep learning-based model, called DeepTM, which exclusively utilizes protein sequences as input and accurately predicts the T(m) values of target thermophilic proteins on a dataset consisting of 7790 thermophilic protein entries. On a test set of 1550 samples, DeepTM demonstrates excellent performance with a coefficient of determination (R(2)) of 0.75, Pearson correlation coefficient (P) of 0.87, and root mean square error (RMSE) of 6.24 ℃. We further analyzed the sequence features that determine the thermal stability of thermophilic proteins and found that dipeptide frequency, optimal growth temperature (OGT) of the host organisms, and the evolutionary information of the protein significantly affect its melting temperature. We compared the performance of DeepTM with recently reported methods, ProTstab2 and DeepSTABp, in predicting the T(m) values on two blind test datasets. One dataset comprised 22 PET plastic-degrading enzymes, while the other included 29 thermally stable proteins of broader classification. In the PET plastic-degrading enzyme dataset, DeepTM achieved RMSE of 8.25 ℃. Compared to ProTstab2 (20.05 ℃) and DeepSTABp (20.97 ℃), DeepTM demonstrated a reduction in RMSE of 58.85% and 60.66%, respectively. In the dataset of thermally stable proteins, DeepTM (RMSE=7.66 ℃) demonstrated a 51.73% reduction in RMSE compared to ProTstab2 (RMSE=15.87 ℃). DeepTM, with the sole requirement of protein sequence information, accurately predicts the melting temperature and achieves a fully end-to-end prediction process, thus providing enhanced convenience and expediency for further protein engineering. Research Network of Computational and Structural Biotechnology 2023-11-04 /pmc/articles/PMC10681957/ /pubmed/38034401 http://dx.doi.org/10.1016/j.csbj.2023.11.006 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Li, Mengyu
Wang, Hongzhao
Yang, Zhenwu
Zhang, Longgui
Zhu, Yushan
DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title_full DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title_fullStr DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title_full_unstemmed DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title_short DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
title_sort deeptm: a deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10681957/
https://www.ncbi.nlm.nih.gov/pubmed/38034401
http://dx.doi.org/10.1016/j.csbj.2023.11.006
work_keys_str_mv AT limengyu deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences
AT wanghongzhao deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences
AT yangzhenwu deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences
AT zhanglonggui deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences
AT zhuyushan deeptmadeeplearningalgorithmforpredictionofmeltingtemperatureofthermophilicproteinsdirectlyfromsequences