Cargando…

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach

BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific po...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Xiaoyong, Shen, Hong-Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331642/
https://www.ncbi.nlm.nih.gov/pubmed/28245811
http://dx.doi.org/10.1186/s12859-017-1561-8
_version_ 1782511418560479232
author Pan, Xiaoyong
Shen, Hong-Bin
author_facet Pan, Xiaoyong
Shen, Hong-Bin
author_sort Pan, Xiaoyong
collection PubMed
description BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. RESULTS: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. CONCLUSION: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5331642
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53316422017-03-03 RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach Pan, Xiaoyong Shen, Hong-Bin BMC Bioinformatics Research Article BACKGROUND: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. RESULTS: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. CONCLUSION: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-02-28 /pmc/articles/PMC5331642/ /pubmed/28245811 http://dx.doi.org/10.1186/s12859-017-1561-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pan, Xiaoyong
Shen, Hong-Bin
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title_full RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title_fullStr RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title_full_unstemmed RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title_short RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
title_sort rna-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5331642/
https://www.ncbi.nlm.nih.gov/pubmed/28245811
http://dx.doi.org/10.1186/s12859-017-1561-8
work_keys_str_mv AT panxiaoyong rnaproteinbindingmotifsminingwithanewhybriddeeplearningbasedcrossdomainknowledgeintegrationapproach
AT shenhongbin rnaproteinbindingmotifsminingwithanewhybriddeeplearningbasedcrossdomainknowledgeintegrationapproach