Cargando…
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific po...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331642/ https://www.ncbi.nlm.nih.gov/pubmed/28245811 http://dx.doi.org/10.1186/s12859-017-1561-8 |
_version_ | 1782511418560479232 |
---|---|
author | Pan, Xiaoyong Shen, Hong-Bin |
author_facet | Pan, Xiaoyong Shen, Hong-Bin |
author_sort | Pan, Xiaoyong |
collection | PubMed |
description | BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. RESULTS: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. CONCLUSION: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5331642 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53316422017-03-03 RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach Pan, Xiaoyong Shen, Hong-Bin BMC Bioinformatics Research Article BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. RESULTS: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. CONCLUSION: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-02-28 /pmc/articles/PMC5331642/ /pubmed/28245811 http://dx.doi.org/10.1186/s12859-017-1561-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Pan, Xiaoyong Shen, Hong-Bin RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title | RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title_full | RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title_fullStr | RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title_full_unstemmed | RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title_short | RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
title_sort | rna-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331642/ https://www.ncbi.nlm.nih.gov/pubmed/28245811 http://dx.doi.org/10.1186/s12859-017-1561-8 |
work_keys_str_mv | AT panxiaoyong rnaproteinbindingmotifsminingwithanewhybriddeeplearningbasedcrossdomainknowledgeintegrationapproach AT shenhongbin rnaproteinbindingmotifsminingwithanewhybriddeeplearningbasedcrossdomainknowledgeintegrationapproach |