Cargando…
Prediction of RNA–protein interactions using a nucleotide language model
MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710633/ https://www.ncbi.nlm.nih.gov/pubmed/36699410 http://dx.doi.org/10.1093/bioadv/vbac023 |
_version_ | 1784841408567312384 |
---|---|
author | Yamada, Keisuke Hamada, Michiaki |
author_facet | Yamada, Keisuke Hamada, Michiaki |
author_sort | Yamada, Keisuke |
collection | PubMed |
description | MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. RESULTS: Here, we propose BERT-RBP as a model to predict RNA–RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. AVAILABILITY AND IMPLEMENTATION: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
format | Online Article Text |
id | pubmed-9710633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97106332023-01-24 Prediction of RNA–protein interactions using a nucleotide language model Yamada, Keisuke Hamada, Michiaki Bioinform Adv Original Paper MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. RESULTS: Here, we propose BERT-RBP as a model to predict RNA–RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. AVAILABILITY AND IMPLEMENTATION: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-04-07 /pmc/articles/PMC9710633/ /pubmed/36699410 http://dx.doi.org/10.1093/bioadv/vbac023 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Yamada, Keisuke Hamada, Michiaki Prediction of RNA–protein interactions using a nucleotide language model |
title | Prediction of RNA–protein interactions using a nucleotide language model |
title_full | Prediction of RNA–protein interactions using a nucleotide language model |
title_fullStr | Prediction of RNA–protein interactions using a nucleotide language model |
title_full_unstemmed | Prediction of RNA–protein interactions using a nucleotide language model |
title_short | Prediction of RNA–protein interactions using a nucleotide language model |
title_sort | prediction of rna–protein interactions using a nucleotide language model |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710633/ https://www.ncbi.nlm.nih.gov/pubmed/36699410 http://dx.doi.org/10.1093/bioadv/vbac023 |
work_keys_str_mv | AT yamadakeisuke predictionofrnaproteininteractionsusinganucleotidelanguagemodel AT hamadamichiaki predictionofrnaproteininteractionsusinganucleotidelanguagemodel |