Cargando…
MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy
[Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning meth...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634282/ https://www.ncbi.nlm.nih.gov/pubmed/37969991 http://dx.doi.org/10.1021/acsomega.3c07086 |
_version_ | 1785146199001530368 |
---|---|
author | Wang, Gang-Ao Yan, Xiaodi Li, Xiang Liu, Yinbo Xia, Junfeng Zhu, Xiaolei |
author_facet | Wang, Gang-Ao Yan, Xiaodi Li, Xiang Liu, Yinbo Xia, Junfeng Zhu, Xiaolei |
author_sort | Wang, Gang-Ao |
collection | PubMed |
description | [Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace. |
format | Online Article Text |
id | pubmed-10634282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-106342822023-11-15 MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy Wang, Gang-Ao Yan, Xiaodi Li, Xiang Liu, Yinbo Xia, Junfeng Zhu, Xiaolei ACS Omega [Image: see text] As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace. American Chemical Society 2023-10-25 /pmc/articles/PMC10634282/ /pubmed/37969991 http://dx.doi.org/10.1021/acsomega.3c07086 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Wang, Gang-Ao Yan, Xiaodi Li, Xiang Liu, Yinbo Xia, Junfeng Zhu, Xiaolei MSTL-Kace: Prediction of Prokaryotic Lysine Acetylation Sites Based on Multistage Transfer Learning Strategy |
title | MSTL-Kace: Prediction
of Prokaryotic Lysine Acetylation
Sites Based on Multistage Transfer Learning Strategy |
title_full | MSTL-Kace: Prediction
of Prokaryotic Lysine Acetylation
Sites Based on Multistage Transfer Learning Strategy |
title_fullStr | MSTL-Kace: Prediction
of Prokaryotic Lysine Acetylation
Sites Based on Multistage Transfer Learning Strategy |
title_full_unstemmed | MSTL-Kace: Prediction
of Prokaryotic Lysine Acetylation
Sites Based on Multistage Transfer Learning Strategy |
title_short | MSTL-Kace: Prediction
of Prokaryotic Lysine Acetylation
Sites Based on Multistage Transfer Learning Strategy |
title_sort | mstl-kace: prediction
of prokaryotic lysine acetylation
sites based on multistage transfer learning strategy |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634282/ https://www.ncbi.nlm.nih.gov/pubmed/37969991 http://dx.doi.org/10.1021/acsomega.3c07086 |
work_keys_str_mv | AT wanggangao mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy AT yanxiaodi mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy AT lixiang mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy AT liuyinbo mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy AT xiajunfeng mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy AT zhuxiaolei mstlkacepredictionofprokaryoticlysineacetylationsitesbasedonmultistagetransferlearningstrategy |