Cargando…

Prediction of RNA–protein interactions using a nucleotide language model

MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Yamada, Keisuke, Hamada, Michiaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710633/
https://www.ncbi.nlm.nih.gov/pubmed/36699410
http://dx.doi.org/10.1093/bioadv/vbac023
_version_ 1784841408567312384
author Yamada, Keisuke
Hamada, Michiaki
author_facet Yamada, Keisuke
Hamada, Michiaki
author_sort Yamada, Keisuke
collection PubMed
description MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. RESULTS: Here, we propose BERT-RBP as a model to predict RNA–RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. AVAILABILITY AND IMPLEMENTATION: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710633
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97106332023-01-24 Prediction of RNA–protein interactions using a nucleotide language model Yamada, Keisuke Hamada, Michiaki Bioinform Adv Original Paper MOTIVATION: The accumulation of sequencing data has enabled researchers to predict the interactions between RNA sequences and RNA-binding proteins (RBPs) using novel machine learning techniques. However, existing models are often difficult to interpret and require additional information to sequences. Bidirectional encoder representations from transformer (BERT) is a language-based deep learning model that is highly interpretable. Therefore, a model based on BERT architecture can potentially overcome such limitations. RESULTS: Here, we propose BERT-RBP as a model to predict RNA–RBP interactions by adapting the BERT architecture pretrained on a human reference genome. Our model outperformed state-of-the-art prediction models using the eCLIP-seq data of 154 RBPs. The detailed analysis further revealed that BERT-RBP could recognize both the transcript region type and RNA secondary structure only based on sequence information. Overall, the results provide insights into the fine-tuning mechanism of BERT in biological contexts and provide evidence of the applicability of the model to other RNA-related problems. AVAILABILITY AND IMPLEMENTATION: Python source codes are freely available at https://github.com/kkyamada/bert-rbp. The datasets underlying this article were derived from sources in the public domain: [RBPsuite (http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/), Ensembl Biomart (http://asia.ensembl.org/biomart/martview/)]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-04-07 /pmc/articles/PMC9710633/ /pubmed/36699410 http://dx.doi.org/10.1093/bioadv/vbac023 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Yamada, Keisuke
Hamada, Michiaki
Prediction of RNA–protein interactions using a nucleotide language model
title Prediction of RNA–protein interactions using a nucleotide language model
title_full Prediction of RNA–protein interactions using a nucleotide language model
title_fullStr Prediction of RNA–protein interactions using a nucleotide language model
title_full_unstemmed Prediction of RNA–protein interactions using a nucleotide language model
title_short Prediction of RNA–protein interactions using a nucleotide language model
title_sort prediction of rna–protein interactions using a nucleotide language model
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710633/
https://www.ncbi.nlm.nih.gov/pubmed/36699410
http://dx.doi.org/10.1093/bioadv/vbac023
work_keys_str_mv AT yamadakeisuke predictionofrnaproteininteractionsusinganucleotidelanguagemodel
AT hamadamichiaki predictionofrnaproteininteractionsusinganucleotidelanguagemodel