Cargando…

RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer si...

Descripción completa

Detalles Bibliográficos
Autores principales:	Orenstein, Yaron, Wang, Yuhao, Berger, Bonnie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908343/ https://www.ncbi.nlm.nih.gov/pubmed/27307637 http://dx.doi.org/10.1093/bioinformatics/btw259

_version_	1782437663756779520
author	Orenstein, Yaron Wang, Yuhao Berger, Bonnie
author_facet	Orenstein, Yaron Wang, Yuhao Berger, Bonnie
author_sort	Orenstein, Yaron
collection	PubMed
description	Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein–RNA structure-based models on an unprecedented scale. Availability and Implementation: Software and models are freely available at http://rck.csail.mit.edu/ Contact: bab@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4908343
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-49083432016-06-17 RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data Orenstein, Yaron Wang, Yuhao Berger, Bonnie Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: Protein–RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein–RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. Results: We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein–RNA structure-based models on an unprecedented scale. Availability and Implementation: Software and models are freely available at http://rck.csail.mit.edu/ Contact: bab@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908343/ /pubmed/27307637 http://dx.doi.org/10.1093/bioinformatics/btw259 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Orenstein, Yaron Wang, Yuhao Berger, Bonnie RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title	RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title_full	RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title_fullStr	RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title_full_unstemmed	RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title_short	RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data
title_sort	rck: accurate and efficient inference of sequence- and structure-based protein–rna binding models from rnacompete data
topic	Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908343/ https://www.ncbi.nlm.nih.gov/pubmed/27307637 http://dx.doi.org/10.1093/bioinformatics/btw259
work_keys_str_mv	AT orensteinyaron rckaccurateandefficientinferenceofsequenceandstructurebasedproteinrnabindingmodelsfromrnacompetedata AT wangyuhao rckaccurateandefficientinferenceofsequenceandstructurebasedproteinrnabindingmodelsfromrnacompetedata AT bergerbonnie rckaccurateandefficientinferenceofsequenceandstructurebasedproteinrnabindingmodelsfromrnacompetedata

RCK: accurate and efficient inference of sequence- and structure-based protein–RNA binding models from RNAcompete data

Ejemplares similares