Cargando…
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets
BACKGROUND: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265420/ https://www.ncbi.nlm.nih.gov/pubmed/25442502 http://dx.doi.org/10.1186/1471-2164-15-1047 |
_version_ | 1782348885372436480 |
---|---|
author | Niu, Meng Tabari, Ehsan S Su, Zhengchang |
author_facet | Niu, Meng Tabari, Ehsan S Su, Zhengchang |
author_sort | Niu, Meng |
collection | PubMed |
description | BACKGROUND: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task. RESULTS: We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences. CONCLUSION: Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1047) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4265420 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42654202014-12-15 De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets Niu, Meng Tabari, Ehsan S Su, Zhengchang BMC Genomics Methodology Article BACKGROUND: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task. RESULTS: We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences. CONCLUSION: Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1047) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-02 /pmc/articles/PMC4265420/ /pubmed/25442502 http://dx.doi.org/10.1186/1471-2164-15-1047 Text en © Niu et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Niu, Meng Tabari, Ehsan S Su, Zhengchang De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title | De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title_full | De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title_fullStr | De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title_full_unstemmed | De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title_short | De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets |
title_sort | de novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of chip datasets |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4265420/ https://www.ncbi.nlm.nih.gov/pubmed/25442502 http://dx.doi.org/10.1186/1471-2164-15-1047 |
work_keys_str_mv | AT niumeng denovopredictionofcisregulatoryelementsandmodulesthroughintegrativeanalysisofalargenumberofchipdatasets AT tabariehsans denovopredictionofcisregulatoryelementsandmodulesthroughintegrativeanalysisofalargenumberofchipdatasets AT suzhengchang denovopredictionofcisregulatoryelementsandmodulesthroughintegrativeanalysisofalargenumberofchipdatasets |