Cargando…

ModuleDigger: an itemset mining framework for the detection of cis-regulatory modules

BACKGROUND: The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search f...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Hong, De Bie, Tijl, Storms, Valerie, Fu, Qiang, Dhollander, Thomas, Lemmens, Karen, Verstuyf, Annemieke, De Moor, Bart, Marchal, Kathleen
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648767/
https://www.ncbi.nlm.nih.gov/pubmed/19208131
http://dx.doi.org/10.1186/1471-2105-10-S1-S30
Descripción
Sumario:BACKGROUND: The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected. RESULTS: We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools. CONCLUSION: We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.