Cargando…

A general approach for discriminative de novo motif discovery from high-throughput data

De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data...

Descripción completa

Detalles Bibliográficos
Autores principales: Grau, Jan, Posch, Stefan, Grosse, Ivo, Keilwagen, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834837/
https://www.ncbi.nlm.nih.gov/pubmed/24057214
http://dx.doi.org/10.1093/nar/gkt831
_version_ 1782292053642706944
author Grau, Jan
Posch, Stefan
Grosse, Ivo
Keilwagen, Jens
author_facet Grau, Jan
Posch, Stefan
Grosse, Ivo
Keilwagen, Jens
author_sort Grau, Jan
collection PubMed
description De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
format Online
Article
Text
id pubmed-3834837
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-38348372013-11-21 A general approach for discriminative de novo motif discovery from high-throughput data Grau, Jan Posch, Stefan Grosse, Ivo Keilwagen, Jens Nucleic Acids Res Methods Online De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research. Oxford University Press 2013-11 2013-09-19 /pmc/articles/PMC3834837/ /pubmed/24057214 http://dx.doi.org/10.1093/nar/gkt831 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Grau, Jan
Posch, Stefan
Grosse, Ivo
Keilwagen, Jens
A general approach for discriminative de novo motif discovery from high-throughput data
title A general approach for discriminative de novo motif discovery from high-throughput data
title_full A general approach for discriminative de novo motif discovery from high-throughput data
title_fullStr A general approach for discriminative de novo motif discovery from high-throughput data
title_full_unstemmed A general approach for discriminative de novo motif discovery from high-throughput data
title_short A general approach for discriminative de novo motif discovery from high-throughput data
title_sort general approach for discriminative de novo motif discovery from high-throughput data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834837/
https://www.ncbi.nlm.nih.gov/pubmed/24057214
http://dx.doi.org/10.1093/nar/gkt831
work_keys_str_mv AT graujan ageneralapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT poschstefan ageneralapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT grosseivo ageneralapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT keilwagenjens ageneralapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT graujan generalapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT poschstefan generalapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT grosseivo generalapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata
AT keilwagenjens generalapproachfordiscriminativedenovomotifdiscoveryfromhighthroughputdata