Cargando…

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several obje...

Descripción completa

Detalles Bibliográficos
Autores principales: Maaskola, Jonas, Rajewsky, Nikolaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245949/
https://www.ncbi.nlm.nih.gov/pubmed/25389269
http://dx.doi.org/10.1093/nar/gku1083
_version_ 1782346455773609984
author Maaskola, Jonas
Rajewsky, Nikolaus
author_facet Maaskola, Jonas
Rajewsky, Nikolaus
author_sort Maaskola, Jonas
collection PubMed
description We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.
format Online
Article
Text
id pubmed-4245949
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42459492014-12-01 Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models Maaskola, Jonas Rajewsky, Nikolaus Nucleic Acids Res Computational Biology We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized. Oxford University Press 2014-12-01 2014-11-11 /pmc/articles/PMC4245949/ /pubmed/25389269 http://dx.doi.org/10.1093/nar/gku1083 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Maaskola, Jonas
Rajewsky, Nikolaus
Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title_full Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title_fullStr Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title_full_unstemmed Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title_short Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
title_sort binding site discovery from nucleic acid sequences by discriminative learning of hidden markov models
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4245949/
https://www.ncbi.nlm.nih.gov/pubmed/25389269
http://dx.doi.org/10.1093/nar/gku1083
work_keys_str_mv AT maaskolajonas bindingsitediscoveryfromnucleicacidsequencesbydiscriminativelearningofhiddenmarkovmodels
AT rajewskynikolaus bindingsitediscoveryfromnucleicacidsequencesbydiscriminativelearningofhiddenmarkovmodels