Cargando…

An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions

Background Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these meth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Ken, Quan, Xiongwen, Jin, Chen, Shi, Zhuangwei, Yang, Jinglong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9012241/ https://www.ncbi.nlm.nih.gov/pubmed/35432476 http://dx.doi.org/10.3389/fgene.2022.885627

_version_	1784687754865541120
author	Lin, Ken Quan, Xiongwen Jin, Chen Shi, Zhuangwei Yang, Jinglong
author_facet	Lin, Ken Quan, Xiongwen Jin, Chen Shi, Zhuangwei Yang, Jinglong
author_sort	Lin, Ken
collection	PubMed
description	Background Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences. Methods In this study, we propose a double-scale attention enzyme class prediction model named DAttProt with high reusability and interpretability. DAttProt encodes sequence by self-supervised Transformer encoders in pre-training and gathers local features by multi-scale convolutions in fine-tuning. Specially, a probabilistic double-scale attention weight matrix is designed to aggregate multi-scale features and positional prediction scores. Finally, a full connection linear classifier conducts a final inference through the aggregated features and prediction scores. Results On DEEPre and ECPred datasets, DAttProt performs as competitive with the compared methods on level 0 and outperforms them on deeper task levels, reaching 0.788 accuracy on level 2 of DEEPre and 0.967 macro-F (1) on level 1 of ECPred. Moreover, through case study, we demonstrate that the double-scale attention matrix learns to discover and focus on the positions and scales of bio-functional sub-sequences in the protein. Conclusion Our DAttProt provides an effective and interpretable method for enzyme class prediction. It can predict enzyme protein classes accurately and furthermore discover enzymatic functional sub-sequences such as protein motifs from both positional and spatial scales.
format	Online Article Text
id	pubmed-9012241
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-90122412022-04-16 An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions Lin, Ken Quan, Xiongwen Jin, Chen Shi, Zhuangwei Yang, Jinglong Front Genet Genetics Background Classification and annotation of enzyme proteins are fundamental for enzyme research on biological metabolism. Enzyme Commission (EC) numbers provide a standard for hierarchical enzyme class prediction, on which several computational methods have been proposed. However, most of these methods are dependent on prior distribution information and none explicitly quantifies amino-acid-level relations and possible contribution of sub-sequences. Methods In this study, we propose a double-scale attention enzyme class prediction model named DAttProt with high reusability and interpretability. DAttProt encodes sequence by self-supervised Transformer encoders in pre-training and gathers local features by multi-scale convolutions in fine-tuning. Specially, a probabilistic double-scale attention weight matrix is designed to aggregate multi-scale features and positional prediction scores. Finally, a full connection linear classifier conducts a final inference through the aggregated features and prediction scores. Results On DEEPre and ECPred datasets, DAttProt performs as competitive with the compared methods on level 0 and outperforms them on deeper task levels, reaching 0.788 accuracy on level 2 of DEEPre and 0.967 macro-F (1) on level 1 of ECPred. Moreover, through case study, we demonstrate that the double-scale attention matrix learns to discover and focus on the positions and scales of bio-functional sub-sequences in the protein. Conclusion Our DAttProt provides an effective and interpretable method for enzyme class prediction. It can predict enzyme protein classes accurately and furthermore discover enzymatic functional sub-sequences such as protein motifs from both positional and spatial scales. Frontiers Media S.A. 2022-04-01 /pmc/articles/PMC9012241/ /pubmed/35432476 http://dx.doi.org/10.3389/fgene.2022.885627 Text en Copyright © 2022 Lin, Quan, Jin, Shi and Yang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Lin, Ken Quan, Xiongwen Jin, Chen Shi, Zhuangwei Yang, Jinglong An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title	An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title_full	An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title_fullStr	An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title_full_unstemmed	An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title_short	An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions
title_sort	interpretable double-scale attention model for enzyme protein class prediction based on transformer encoders and multi-scale convolutions
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9012241/ https://www.ncbi.nlm.nih.gov/pubmed/35432476 http://dx.doi.org/10.3389/fgene.2022.885627
work_keys_str_mv	AT linken aninterpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT quanxiongwen aninterpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT jinchen aninterpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT shizhuangwei aninterpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT yangjinglong aninterpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT linken interpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT quanxiongwen interpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT jinchen interpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT shizhuangwei interpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions AT yangjinglong interpretabledoublescaleattentionmodelforenzymeproteinclasspredictionbasedontransformerencodersandmultiscaleconvolutions

An Interpretable Double-Scale Attention Model for Enzyme Protein Class Prediction Based on Transformer Encoders and Multi-Scale Convolutions

Ejemplares similares