Cargando…

ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites

Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can...

Descripción completa

Detalles Bibliográficos
Autores principales: Indriani, Fatma, Mahmudah, Kunti Robiatul, Purnama, Bedy, Satou, Kenji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194472/
https://www.ncbi.nlm.nih.gov/pubmed/35711929
http://dx.doi.org/10.3389/fgene.2022.885929
_version_ 1784726735866036224
author Indriani, Fatma
Mahmudah, Kunti Robiatul
Purnama, Bedy
Satou, Kenji
author_facet Indriani, Fatma
Mahmudah, Kunti Robiatul
Purnama, Bedy
Satou, Kenji
author_sort Indriani, Fatma
collection PubMed
description Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process.
format Online
Article
Text
id pubmed-9194472
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91944722022-06-15 ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites Indriani, Fatma Mahmudah, Kunti Robiatul Purnama, Bedy Satou, Kenji Front Genet Genetics Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process. Frontiers Media S.A. 2022-05-31 /pmc/articles/PMC9194472/ /pubmed/35711929 http://dx.doi.org/10.3389/fgene.2022.885929 Text en Copyright © 2022 Indriani, Mahmudah, Purnama and Satou. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Indriani, Fatma
Mahmudah, Kunti Robiatul
Purnama, Bedy
Satou, Kenji
ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title_full ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title_fullStr ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title_full_unstemmed ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title_short ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
title_sort prottrans-glutar: incorporating features from pre-trained transformer-based models for predicting glutarylation sites
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194472/
https://www.ncbi.nlm.nih.gov/pubmed/35711929
http://dx.doi.org/10.3389/fgene.2022.885929
work_keys_str_mv AT indrianifatma prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites
AT mahmudahkuntirobiatul prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites
AT purnamabedy prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites
AT satoukenji prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites