Cargando…

MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy

[Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Gang-Ao, Yan, Xiaodi, Li, Xiang, Liu, Yinbo, Xia, Junfeng, Zhu, Xiaolei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634282/
https://www.ncbi.nlm.nih.gov/pubmed/37969991
http://dx.doi.org/10.1021/acsomega.3c07086
_version_ 1785146199001530368
author Wang, Gang-Ao
Yan, Xiaodi
Li, Xiang
Liu, Yinbo
Xia, Junfeng
Zhu, Xiaolei
author_facet Wang, Gang-Ao
Yan, Xiaodi
Li, Xiang
Liu, Yinbo
Xia, Junfeng
Zhu, Xiaolei
author_sort Wang, Gang-Ao
collection PubMed
description [Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.
format Online
Article
Text
id pubmed-10634282
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106342822023-11-15 MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy Wang, Gang-Ao Yan, Xiaodi Li, Xiang Liu, Yinbo Xia, Junfeng Zhu, Xiaolei ACS Omega [Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace. American Chemical Society 2023-10-25 /pmc/articles/PMC10634282/ /pubmed/37969991 http://dx.doi.org/10.1021/acsomega.3c07086 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Wang, Gang-Ao
Yan, Xiaodi
Li, Xiang
Liu, Yinbo
Xia, Junfeng
Zhu, Xiaolei
MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title_full MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title_fullStr MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title_full_unstemmed MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title_short MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
title_sort mstl-kace: prediction of prokaryotic lysine acetylation sites based on multistage transfer learning strategy
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634282/
https://www.ncbi.nlm.nih.gov/pubmed/37969991
http://dx.doi.org/10.1021/acsomega.3c07086
work_keys_str_mv AT wanggangao mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy
AT yanxiaodi mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy
AT lixiang mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy
AT liuyinbo mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy
AT xiajunfeng mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy
AT zhuxiaolei mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy