Cargando…

Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework

Enzyme commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab initio computational approaches were proposed to predict EC numbers for given input protei...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Zhenkun, Deng, Rui, Yuan, Qianqian, Mao, Zhitao, Wang, Ruoyu, Li, Haoran, Liao, Xiaoping, Ma, Hongwu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AAAS 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10232324/
https://www.ncbi.nlm.nih.gov/pubmed/37275124
http://dx.doi.org/10.34133/research.0153
_version_ 1785051950450999296
author Shi, Zhenkun
Deng, Rui
Yuan, Qianqian
Mao, Zhitao
Wang, Ruoyu
Li, Haoran
Liao, Xiaoping
Ma, Hongwu
author_facet Shi, Zhenkun
Deng, Rui
Yuan, Qianqian
Mao, Zhitao
Wang, Ruoyu
Li, Haoran
Liao, Xiaoping
Ma, Hongwu
author_sort Shi, Zhenkun
collection PubMed
description Enzyme commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab initio computational approaches were proposed to predict EC numbers for given input protein sequences. However, the prediction performance (accuracy, recall, and precision), usability, and efficiency of existing methods decreased seriously when dealing with recently discovered proteins, thus still having much room to be improved. Here, we report HDMLF, a hierarchical dual-core multitask learning framework for accurately predicting EC numbers based on novel deep learning techniques. HDMLF is composed of an embedding core and a learning core; the embedding core adopts the latest protein language model for protein sequence embedding, and the learning core conducts the EC number prediction. Specifically, HDMLF is designed on the basis of a gated recurrent unit framework to perform EC number prediction in the multi-objective hierarchy, multitasking manner. Additionally, we introduced an attention layer to optimize the EC prediction and employed a greedy strategy to integrate and fine-tune the final model. Comparative analyses against 4 representative methods demonstrate that HDMLF stably delivers the highest performance, which improves accuracy and F1 score by 60% and 40% over the state of the art, respectively. An additional case study of tyrB predicted to compensate for the loss of aspartate aminotransferase aspC, as reported in a previous experimental study, shows that our model can also be used to uncover the enzyme promiscuity. Finally, we established a web platform, namely, ECRECer (https://ecrecer.biodesign.ac.cn), using an entirely could-based serverless architecture and provided an offline bundle to improve usability.
format Online
Article
Text
id pubmed-10232324
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher AAAS
record_format MEDLINE/PubMed
spelling pubmed-102323242023-06-02 Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework Shi, Zhenkun Deng, Rui Yuan, Qianqian Mao, Zhitao Wang, Ruoyu Li, Haoran Liao, Xiaoping Ma, Hongwu Research (Wash D C) Research Article Enzyme commission (EC) numbers, which associate a protein sequence with the biochemical reactions it catalyzes, are essential for the accurate understanding of enzyme functions and cellular metabolism. Many ab initio computational approaches were proposed to predict EC numbers for given input protein sequences. However, the prediction performance (accuracy, recall, and precision), usability, and efficiency of existing methods decreased seriously when dealing with recently discovered proteins, thus still having much room to be improved. Here, we report HDMLF, a hierarchical dual-core multitask learning framework for accurately predicting EC numbers based on novel deep learning techniques. HDMLF is composed of an embedding core and a learning core; the embedding core adopts the latest protein language model for protein sequence embedding, and the learning core conducts the EC number prediction. Specifically, HDMLF is designed on the basis of a gated recurrent unit framework to perform EC number prediction in the multi-objective hierarchy, multitasking manner. Additionally, we introduced an attention layer to optimize the EC prediction and employed a greedy strategy to integrate and fine-tune the final model. Comparative analyses against 4 representative methods demonstrate that HDMLF stably delivers the highest performance, which improves accuracy and F1 score by 60% and 40% over the state of the art, respectively. An additional case study of tyrB predicted to compensate for the loss of aspartate aminotransferase aspC, as reported in a previous experimental study, shows that our model can also be used to uncover the enzyme promiscuity. Finally, we established a web platform, namely, ECRECer (https://ecrecer.biodesign.ac.cn), using an entirely could-based serverless architecture and provided an offline bundle to improve usability. AAAS 2023-05-31 /pmc/articles/PMC10232324/ /pubmed/37275124 http://dx.doi.org/10.34133/research.0153 Text en Copyright © 2023 Zhenkun Shi et al. https://creativecommons.org/licenses/by/4.0/Exclusive licensee Science and Technology Review Publishing House. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Shi, Zhenkun
Deng, Rui
Yuan, Qianqian
Mao, Zhitao
Wang, Ruoyu
Li, Haoran
Liao, Xiaoping
Ma, Hongwu
Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title_full Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title_fullStr Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title_full_unstemmed Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title_short Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework
title_sort enzyme commission number prediction and benchmarking with hierarchical dual-core multitask learning framework
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10232324/
https://www.ncbi.nlm.nih.gov/pubmed/37275124
http://dx.doi.org/10.34133/research.0153
work_keys_str_mv AT shizhenkun enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT dengrui enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT yuanqianqian enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT maozhitao enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT wangruoyu enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT lihaoran enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT liaoxiaoping enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework
AT mahongwu enzymecommissionnumberpredictionandbenchmarkingwithhierarchicaldualcoremultitasklearningframework