Cargando…

RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Xinxin, Wang, Xiaoyu, Guo, Yuming, Ge, Zongyuan, Li, Fuyi, Gao, Xin, Song, Jiangning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294422/
https://www.ncbi.nlm.nih.gov/pubmed/35649392
http://dx.doi.org/10.1093/bib/bbac215
_version_ 1784749850250706944
author Peng, Xinxin
Wang, Xiaoyu
Guo, Yuming
Ge, Zongyuan
Li, Fuyi
Gao, Xin
Song, Jiangning
author_facet Peng, Xinxin
Wang, Xiaoyu
Guo, Yuming
Ge, Zongyuan
Li, Fuyi
Gao, Xin
Song, Jiangning
author_sort Peng, Xinxin
collection PubMed
description RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships.
format Online
Article
Text
id pubmed-9294422
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92944222022-07-20 RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins Peng, Xinxin Wang, Xiaoyu Guo, Yuming Ge, Zongyuan Li, Fuyi Gao, Xin Song, Jiangning Brief Bioinform Problem Solving Protocol RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships. Oxford University Press 2022-06-02 /pmc/articles/PMC9294422/ /pubmed/35649392 http://dx.doi.org/10.1093/bib/bbac215 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Peng, Xinxin
Wang, Xiaoyu
Guo, Yuming
Ge, Zongyuan
Li, Fuyi
Gao, Xin
Song, Jiangning
RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title_full RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title_fullStr RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title_full_unstemmed RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title_short RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
title_sort rbp-tstl is a two-stage transfer learning framework for genome-scale prediction of rna-binding proteins
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294422/
https://www.ncbi.nlm.nih.gov/pubmed/35649392
http://dx.doi.org/10.1093/bib/bbac215
work_keys_str_mv AT pengxinxin rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT wangxiaoyu rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT guoyuming rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT gezongyuan rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT lifuyi rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT gaoxin rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins
AT songjiangning rbptstlisatwostagetransferlearningframeworkforgenomescalepredictionofrnabindingproteins