Cargando…
ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites
Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194472/ https://www.ncbi.nlm.nih.gov/pubmed/35711929 http://dx.doi.org/10.3389/fgene.2022.885929 |
_version_ | 1784726735866036224 |
---|---|
author | Indriani, Fatma Mahmudah, Kunti Robiatul Purnama, Bedy Satou, Kenji |
author_facet | Indriani, Fatma Mahmudah, Kunti Robiatul Purnama, Bedy Satou, Kenji |
author_sort | Indriani, Fatma |
collection | PubMed |
description | Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process. |
format | Online Article Text |
id | pubmed-9194472 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91944722022-06-15 ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites Indriani, Fatma Mahmudah, Kunti Robiatul Purnama, Bedy Satou, Kenji Front Genet Genetics Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process. Frontiers Media S.A. 2022-05-31 /pmc/articles/PMC9194472/ /pubmed/35711929 http://dx.doi.org/10.3389/fgene.2022.885929 Text en Copyright © 2022 Indriani, Mahmudah, Purnama and Satou. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Indriani, Fatma Mahmudah, Kunti Robiatul Purnama, Bedy Satou, Kenji ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title | ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title_full | ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title_fullStr | ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title_full_unstemmed | ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title_short | ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites |
title_sort | prottrans-glutar: incorporating features from pre-trained transformer-based models for predicting glutarylation sites |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9194472/ https://www.ncbi.nlm.nih.gov/pubmed/35711929 http://dx.doi.org/10.3389/fgene.2022.885929 |
work_keys_str_mv | AT indrianifatma prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites AT mahmudahkuntirobiatul prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites AT purnamabedy prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites AT satoukenji prottransglutarincorporatingfeaturesfrompretrainedtransformerbasedmodelsforpredictingglutarylationsites |