Cargando…
Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data
BACKGROUND: Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386822/ https://www.ncbi.nlm.nih.gov/pubmed/18426580 http://dx.doi.org/10.1186/1471-2105-9-203 |
_version_ | 1782155271168065536 |
---|---|
author | Zhang, Yuji Xuan, Jianhua de los Reyes, Benildo G Clarke, Robert Ressom, Habtom W |
author_facet | Zhang, Yuji Xuan, Jianhua de los Reyes, Benildo G Clarke, Robert Ressom, Habtom W |
author_sort | Zhang, Yuji |
collection | PubMed |
description | BACKGROUND: Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN) with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM), facilitates the inference. Identifying NMs defined by specific transcription factors (TF) establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO) information. RESULTS: The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM) classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1) Gene set enrichment analysis (GSEA) to evaluate the clustering results; (2) Leave-one-out cross-validation (LOOCV) to ensure that the SVM classifiers assign TFs to NM categories with high confidence; (3) Binding site enrichment analysis (BSEA) to determine enrichment of the gene clusters for the cognate binding sites of their predicted TFs; (4) Comparison with previously reported results in the literatures to confirm the inferred regulations. CONCLUSION: The major contribution of this study is the development of a computational framework to assist the inference of TRN by integrating heterogeneous data from multiple sources and by decomposing a TRN into NM-based modules. The inference capability of the proposed framework is verified statistically (e.g., LOOCV) and biologically (e.g., GSEA, BSEA, and literature validation). The proposed framework is useful for inferring small NM-based modules of TF-target gene relationships that can serve as a basis for generating new testable hypotheses. |
format | Text |
id | pubmed-2386822 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23868222008-05-19 Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data Zhang, Yuji Xuan, Jianhua de los Reyes, Benildo G Clarke, Robert Ressom, Habtom W BMC Bioinformatics Research Article BACKGROUND: Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN) with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM), facilitates the inference. Identifying NMs defined by specific transcription factors (TF) establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO) information. RESULTS: The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM) classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1) Gene set enrichment analysis (GSEA) to evaluate the clustering results; (2) Leave-one-out cross-validation (LOOCV) to ensure that the SVM classifiers assign TFs to NM categories with high confidence; (3) Binding site enrichment analysis (BSEA) to determine enrichment of the gene clusters for the cognate binding sites of their predicted TFs; (4) Comparison with previously reported results in the literatures to confirm the inferred regulations. CONCLUSION: The major contribution of this study is the development of a computational framework to assist the inference of TRN by integrating heterogeneous data from multiple sources and by decomposing a TRN into NM-based modules. The inference capability of the proposed framework is verified statistically (e.g., LOOCV) and biologically (e.g., GSEA, BSEA, and literature validation). The proposed framework is useful for inferring small NM-based modules of TF-target gene relationships that can serve as a basis for generating new testable hypotheses. BioMed Central 2008-04-21 /pmc/articles/PMC2386822/ /pubmed/18426580 http://dx.doi.org/10.1186/1471-2105-9-203 Text en Copyright © 2008 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhang, Yuji Xuan, Jianhua de los Reyes, Benildo G Clarke, Robert Ressom, Habtom W Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title | Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title_full | Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title_fullStr | Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title_full_unstemmed | Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title_short | Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
title_sort | network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386822/ https://www.ncbi.nlm.nih.gov/pubmed/18426580 http://dx.doi.org/10.1186/1471-2105-9-203 |
work_keys_str_mv | AT zhangyuji networkmotifbasedidentificationoftranscriptionfactortargetgenerelationshipsbyintegratingmultisourcebiologicaldata AT xuanjianhua networkmotifbasedidentificationoftranscriptionfactortargetgenerelationshipsbyintegratingmultisourcebiologicaldata AT delosreyesbenildog networkmotifbasedidentificationoftranscriptionfactortargetgenerelationshipsbyintegratingmultisourcebiologicaldata AT clarkerobert networkmotifbasedidentificationoftranscriptionfactortargetgenerelationshipsbyintegratingmultisourcebiologicaldata AT ressomhabtomw networkmotifbasedidentificationoftranscriptionfactortargetgenerelationshipsbyintegratingmultisourcebiologicaldata |