Cargando…

Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure

BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanis...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Lei, Liu, Youzhi, Shi, Yechuan, Zhang, Wenhao, Yang, Chun, Liu, Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745412/
https://www.ncbi.nlm.nih.gov/pubmed/33334313
http://dx.doi.org/10.1186/s12864-020-07239-w
_version_ 1783624600342495232
author Deng, Lei
Liu, Youzhi
Shi, Yechuan
Zhang, Wenhao
Yang, Chun
Liu, Hui
author_facet Deng, Lei
Liu, Youzhi
Shi, Yechuan
Zhang, Wenhao
Yang, Chun
Liu, Hui
author_sort Deng, Lei
collection PubMed
description BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. RESULTS: In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. CONCLUSIONS: Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w).
format Online
Article
Text
id pubmed-7745412
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77454122020-12-18 Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure Deng, Lei Liu, Youzhi Shi, Yechuan Zhang, Wenhao Yang, Chun Liu, Hui BMC Genomics Research BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. RESULTS: In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. CONCLUSIONS: Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w). BioMed Central 2020-12-17 /pmc/articles/PMC7745412/ /pubmed/33334313 http://dx.doi.org/10.1186/s12864-020-07239-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Deng, Lei
Liu, Youzhi
Shi, Yechuan
Zhang, Wenhao
Yang, Chun
Liu, Hui
Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title_full Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title_fullStr Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title_full_unstemmed Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title_short Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
title_sort deep neural networks for inferring binding sites of rna-binding proteins by using distributed representations of rna primary sequence and secondary structure
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745412/
https://www.ncbi.nlm.nih.gov/pubmed/33334313
http://dx.doi.org/10.1186/s12864-020-07239-w
work_keys_str_mv AT denglei deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure
AT liuyouzhi deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure
AT shiyechuan deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure
AT zhangwenhao deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure
AT yangchun deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure
AT liuhui deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure