Cargando…
Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure
BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanis...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745412/ https://www.ncbi.nlm.nih.gov/pubmed/33334313 http://dx.doi.org/10.1186/s12864-020-07239-w |
_version_ | 1783624600342495232 |
---|---|
author | Deng, Lei Liu, Youzhi Shi, Yechuan Zhang, Wenhao Yang, Chun Liu, Hui |
author_facet | Deng, Lei Liu, Youzhi Shi, Yechuan Zhang, Wenhao Yang, Chun Liu, Hui |
author_sort | Deng, Lei |
collection | PubMed |
description | BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. RESULTS: In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. CONCLUSIONS: Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w). |
format | Online Article Text |
id | pubmed-7745412 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77454122020-12-18 Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure Deng, Lei Liu, Youzhi Shi, Yechuan Zhang, Wenhao Yang, Chun Liu, Hui BMC Genomics Research BACKGROUND: RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. RESULTS: In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. CONCLUSIONS: Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (doi:10.1186/s12864-020-07239-w). BioMed Central 2020-12-17 /pmc/articles/PMC7745412/ /pubmed/33334313 http://dx.doi.org/10.1186/s12864-020-07239-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Deng, Lei Liu, Youzhi Shi, Yechuan Zhang, Wenhao Yang, Chun Liu, Hui Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title | Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title_full | Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title_fullStr | Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title_full_unstemmed | Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title_short | Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure |
title_sort | deep neural networks for inferring binding sites of rna-binding proteins by using distributed representations of rna primary sequence and secondary structure |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745412/ https://www.ncbi.nlm.nih.gov/pubmed/33334313 http://dx.doi.org/10.1186/s12864-020-07239-w |
work_keys_str_mv | AT denglei deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure AT liuyouzhi deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure AT shiyechuan deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure AT zhangwenhao deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure AT yangchun deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure AT liuhui deepneuralnetworksforinferringbindingsitesofrnabindingproteinsbyusingdistributedrepresentationsofrnaprimarysequenceandsecondarystructure |