Cargando…

Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD

SIMPLE SUMMARY: Various protein sequences are registered in biological databases, and hundreds of the sequences have recently been sequenced by way of next-generation sequencing, and then the number of sequences with unknown functions is explosively increasing. To efficiently determine the annotatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Watanabe, Naoki, Kuriya, Yuki, Murata, Masahiro, Yamamoto, Masaki, Shimizu, Masayuki, Araki, Michihiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295083/
https://www.ncbi.nlm.nih.gov/pubmed/37372080
http://dx.doi.org/10.3390/biology12060795
_version_ 1785063336017133568
author Watanabe, Naoki
Kuriya, Yuki
Murata, Masahiro
Yamamoto, Masaki
Shimizu, Masayuki
Araki, Michihiro
author_facet Watanabe, Naoki
Kuriya, Yuki
Murata, Masahiro
Yamamoto, Masaki
Shimizu, Masayuki
Araki, Michihiro
author_sort Watanabe, Naoki
collection PubMed
description SIMPLE SUMMARY: Various protein sequences are registered in biological databases, and hundreds of the sequences have recently been sequenced by way of next-generation sequencing, and then the number of sequences with unknown functions is explosively increasing. To efficiently determine the annotations, new feature extraction of protein sequences that is different from existing knowledge is required. Deep learning can extract various features based on training data. Many studies have reported deep learning models with high accuracy for predicting protein annotations; however, in the reports, which amino acid sites in protein are important for the prediction of the annotations have not been discussed among multiple deep learning models. Here, 3 deep learning models for the prediction of the proteins included in a protein family were analyzed using an explainable artificial intelligence method to explore important protein features. The models regarded different sites as important for each model, and all models also recognize different amino acids from the secondary structure, conserved regions and active sites as important features. These results suggest that the models can interpret protein sequences through different perspectives from existing knowledge. ABSTRACT: The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.
format Online
Article
Text
id pubmed-10295083
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102950832023-06-28 Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD Watanabe, Naoki Kuriya, Yuki Murata, Masahiro Yamamoto, Masaki Shimizu, Masayuki Araki, Michihiro Biology (Basel) Article SIMPLE SUMMARY: Various protein sequences are registered in biological databases, and hundreds of the sequences have recently been sequenced by way of next-generation sequencing, and then the number of sequences with unknown functions is explosively increasing. To efficiently determine the annotations, new feature extraction of protein sequences that is different from existing knowledge is required. Deep learning can extract various features based on training data. Many studies have reported deep learning models with high accuracy for predicting protein annotations; however, in the reports, which amino acid sites in protein are important for the prediction of the annotations have not been discussed among multiple deep learning models. Here, 3 deep learning models for the prediction of the proteins included in a protein family were analyzed using an explainable artificial intelligence method to explore important protein features. The models regarded different sites as important for each model, and all models also recognize different amino acids from the secondary structure, conserved regions and active sites as important features. These results suggest that the models can interpret protein sequences through different perspectives from existing knowledge. ABSTRACT: The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations. MDPI 2023-05-31 /pmc/articles/PMC10295083/ /pubmed/37372080 http://dx.doi.org/10.3390/biology12060795 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Watanabe, Naoki
Kuriya, Yuki
Murata, Masahiro
Yamamoto, Masaki
Shimizu, Masayuki
Araki, Michihiro
Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title_full Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title_fullStr Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title_full_unstemmed Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title_short Different Recognition of Protein Features Depending on Deep Learning Models: A Case Study of Aromatic Decarboxylase UbiD
title_sort different recognition of protein features depending on deep learning models: a case study of aromatic decarboxylase ubid
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10295083/
https://www.ncbi.nlm.nih.gov/pubmed/37372080
http://dx.doi.org/10.3390/biology12060795
work_keys_str_mv AT watanabenaoki differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid
AT kuriyayuki differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid
AT muratamasahiro differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid
AT yamamotomasaki differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid
AT shimizumasayuki differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid
AT arakimichihiro differentrecognitionofproteinfeaturesdependingondeeplearningmodelsacasestudyofaromaticdecarboxylaseubid