Cargando…

ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA moti...

Descripción completa

Detalles Bibliográficos
Autores principales: Heller, David, Krestel, Ralf, Ohler, Uwe, Vingron, Martin, Marsico, Annalisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737366/
https://www.ncbi.nlm.nih.gov/pubmed/28977546
http://dx.doi.org/10.1093/nar/gkx756
_version_ 1783287505123016704
author Heller, David
Krestel, Ralf
Ohler, Uwe
Vingron, Martin
Marsico, Annalisa
author_facet Heller, David
Krestel, Ralf
Ohler, Uwe
Vingron, Martin
Marsico, Annalisa
author_sort Heller, David
collection PubMed
description RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
format Online
Article
Text
id pubmed-5737366
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57373662018-01-08 ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data Heller, David Krestel, Ralf Ohler, Uwe Vingron, Martin Marsico, Annalisa Nucleic Acids Res Computational Biology RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. Oxford University Press 2017-11-02 2017-08-30 /pmc/articles/PMC5737366/ /pubmed/28977546 http://dx.doi.org/10.1093/nar/gkx756 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Heller, David
Krestel, Ralf
Ohler, Uwe
Vingron, Martin
Marsico, Annalisa
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title_full ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title_fullStr ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title_full_unstemmed ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title_short ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
title_sort sshmm: extracting intuitive sequence-structure motifs from high-throughput rna-binding protein data
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737366/
https://www.ncbi.nlm.nih.gov/pubmed/28977546
http://dx.doi.org/10.1093/nar/gkx756
work_keys_str_mv AT hellerdavid sshmmextractingintuitivesequencestructuremotifsfromhighthroughputrnabindingproteindata
AT krestelralf sshmmextractingintuitivesequencestructuremotifsfromhighthroughputrnabindingproteindata
AT ohleruwe sshmmextractingintuitivesequencestructuremotifsfromhighthroughputrnabindingproteindata
AT vingronmartin sshmmextractingintuitivesequencestructuremotifsfromhighthroughputrnabindingproteindata
AT marsicoannalisa sshmmextractingintuitivesequencestructuremotifsfromhighthroughputrnabindingproteindata