Cargando…

Discovering Motifs in Ranked Lists of DNA Sequences

Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Eden, Eran, Lipson, Doron, Yogev, Sivan, Yakhini, Zohar
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1829477/
https://www.ncbi.nlm.nih.gov/pubmed/17381235
http://dx.doi.org/10.1371/journal.pcbi.0030039
_version_ 1782132762449281024
author Eden, Eran
Lipson, Doron
Yogev, Sivan
Yakhini, Zohar
author_facet Eden, Eran
Lipson, Doron
Yogev, Sivan
Yakhini, Zohar
author_sort Eden, Eran
collection PubMed
description Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.
format Text
id pubmed-1829477
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18294772007-03-23 Discovering Motifs in Ranked Lists of DNA Sequences Eden, Eran Lipson, Doron Yogev, Sivan Yakhini, Zohar PLoS Comput Biol Research Article Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim. Public Library of Science 2007-03 2007-03-23 /pmc/articles/PMC1829477/ /pubmed/17381235 http://dx.doi.org/10.1371/journal.pcbi.0030039 Text en © 2007 Eden et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Eden, Eran
Lipson, Doron
Yogev, Sivan
Yakhini, Zohar
Discovering Motifs in Ranked Lists of DNA Sequences
title Discovering Motifs in Ranked Lists of DNA Sequences
title_full Discovering Motifs in Ranked Lists of DNA Sequences
title_fullStr Discovering Motifs in Ranked Lists of DNA Sequences
title_full_unstemmed Discovering Motifs in Ranked Lists of DNA Sequences
title_short Discovering Motifs in Ranked Lists of DNA Sequences
title_sort discovering motifs in ranked lists of dna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1829477/
https://www.ncbi.nlm.nih.gov/pubmed/17381235
http://dx.doi.org/10.1371/journal.pcbi.0030039
work_keys_str_mv AT edeneran discoveringmotifsinrankedlistsofdnasequences
AT lipsondoron discoveringmotifsinrankedlistsofdnasequences
AT yogevsivan discoveringmotifsinrankedlistsofdnasequences
AT yakhinizohar discoveringmotifsinrankedlistsofdnasequences