Cargando…

SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interaction...

Descripción completa

Detalles Bibliográficos
Autores principales: Dotu, Ivan, Adamson, Scott I., Coleman, Benjamin, Fournier, Cyril, Ricart-Altimiras, Emma, Eyras, Eduardo, Chuang, Jeffrey H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5892938/
https://www.ncbi.nlm.nih.gov/pubmed/29596423
http://dx.doi.org/10.1371/journal.pcbi.1006078
_version_ 1783313236830978048
author Dotu, Ivan
Adamson, Scott I.
Coleman, Benjamin
Fournier, Cyril
Ricart-Altimiras, Emma
Eyras, Eduardo
Chuang, Jeffrey H.
author_facet Dotu, Ivan
Adamson, Scott I.
Coleman, Benjamin
Fournier, Cyril
Ricart-Altimiras, Emma
Eyras, Eduardo
Chuang, Jeffrey H.
author_sort Dotu, Ivan
collection PubMed
description RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC.
format Online
Article
Text
id pubmed-5892938
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58929382018-04-20 SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data Dotu, Ivan Adamson, Scott I. Coleman, Benjamin Fournier, Cyril Ricart-Altimiras, Emma Eyras, Eduardo Chuang, Jeffrey H. PLoS Comput Biol Research Article RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. Public Library of Science 2018-03-29 /pmc/articles/PMC5892938/ /pubmed/29596423 http://dx.doi.org/10.1371/journal.pcbi.1006078 Text en © 2018 Dotu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dotu, Ivan
Adamson, Scott I.
Coleman, Benjamin
Fournier, Cyril
Ricart-Altimiras, Emma
Eyras, Eduardo
Chuang, Jeffrey H.
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title_full SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title_fullStr SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title_full_unstemmed SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title_short SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
title_sort sarnaclust: semi-automatic detection of rna protein binding motifs from immunoprecipitation data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5892938/
https://www.ncbi.nlm.nih.gov/pubmed/29596423
http://dx.doi.org/10.1371/journal.pcbi.1006078
work_keys_str_mv AT dotuivan sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT adamsonscotti sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT colemanbenjamin sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT fourniercyril sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT ricartaltimirasemma sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT eyraseduardo sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata
AT chuangjeffreyh sarnaclustsemiautomaticdetectionofrnaproteinbindingmotifsfromimmunoprecipitationdata