Cargando…

Evidential deep learning for trustworthy prediction of enzyme commission number

The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, So-Ra, Park, Mingyu, Kosaraju, Sai, Lee, JeungMin, Lee, Hyun, Lee, Jun Hyuck, Oh, Tae-Jin, Kang, Mingon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10664415/
https://www.ncbi.nlm.nih.gov/pubmed/37991247
http://dx.doi.org/10.1093/bib/bbad401
_version_ 1785148731107049472
author Han, So-Ra
Park, Mingyu
Kosaraju, Sai
Lee, JeungMin
Lee, Hyun
Lee, Jun Hyuck
Oh, Tae-Jin
Kang, Mingon
author_facet Han, So-Ra
Park, Mingyu
Kosaraju, Sai
Lee, JeungMin
Lee, Hyun
Lee, Jun Hyuck
Oh, Tae-Jin
Kang, Mingon
author_sort Han, So-Ra
collection PubMed
description The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here, we demonstrate that a novel evidential deep learning model (named ECPICK) makes trustworthy predictions of enzyme commission (EC) numbers with data-driven domain-relevant evidence, which results in significantly enhanced predictive power and the capability to discover potential new motif sites. ECPICK learns complex sequential patterns of amino acids and their hierarchical structures from 20 million enzyme data. ECPICK identifies significant amino acids that contribute to the prediction without multiple sequence alignment. Our intensive assessment showed not only outstanding enhancement of predictive performance on the largest databases of Uniprot, Protein Data Bank (PDB) and Kyoto Encyclopedia of Genes and Genomes (KEGG), but also a capability to discover new motif sites in microorganisms. ECPICK is a reliable EC number prediction tool to identify protein functions of an increasing number of uncharacterized enzymes.
format Online
Article
Text
id pubmed-10664415
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106644152023-11-22 Evidential deep learning for trustworthy prediction of enzyme commission number Han, So-Ra Park, Mingyu Kosaraju, Sai Lee, JeungMin Lee, Hyun Lee, Jun Hyuck Oh, Tae-Jin Kang, Mingon Brief Bioinform Problem Solving Protocol The rapid growth of uncharacterized enzymes and their functional diversity urge accurate and trustworthy computational functional annotation tools. However, current state-of-the-art models lack trustworthiness on the prediction of the multilabel classification problem with thousands of classes. Here, we demonstrate that a novel evidential deep learning model (named ECPICK) makes trustworthy predictions of enzyme commission (EC) numbers with data-driven domain-relevant evidence, which results in significantly enhanced predictive power and the capability to discover potential new motif sites. ECPICK learns complex sequential patterns of amino acids and their hierarchical structures from 20 million enzyme data. ECPICK identifies significant amino acids that contribute to the prediction without multiple sequence alignment. Our intensive assessment showed not only outstanding enhancement of predictive performance on the largest databases of Uniprot, Protein Data Bank (PDB) and Kyoto Encyclopedia of Genes and Genomes (KEGG), but also a capability to discover new motif sites in microorganisms. ECPICK is a reliable EC number prediction tool to identify protein functions of an increasing number of uncharacterized enzymes. Oxford University Press 2023-11-22 /pmc/articles/PMC10664415/ /pubmed/37991247 http://dx.doi.org/10.1093/bib/bbad401 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Han, So-Ra
Park, Mingyu
Kosaraju, Sai
Lee, JeungMin
Lee, Hyun
Lee, Jun Hyuck
Oh, Tae-Jin
Kang, Mingon
Evidential deep learning for trustworthy prediction of enzyme commission number
title Evidential deep learning for trustworthy prediction of enzyme commission number
title_full Evidential deep learning for trustworthy prediction of enzyme commission number
title_fullStr Evidential deep learning for trustworthy prediction of enzyme commission number
title_full_unstemmed Evidential deep learning for trustworthy prediction of enzyme commission number
title_short Evidential deep learning for trustworthy prediction of enzyme commission number
title_sort evidential deep learning for trustworthy prediction of enzyme commission number
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10664415/
https://www.ncbi.nlm.nih.gov/pubmed/37991247
http://dx.doi.org/10.1093/bib/bbad401
work_keys_str_mv AT hansora evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT parkmingyu evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT kosarajusai evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT leejeungmin evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT leehyun evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT leejunhyuck evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT ohtaejin evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber
AT kangmingon evidentialdeeplearningfortrustworthypredictionofenzymecommissionnumber