Cargando…
TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (G...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2930629/ https://www.ncbi.nlm.nih.gov/pubmed/20704747 http://dx.doi.org/10.1186/1471-2105-11-425 |
_version_ | 1782185989527044096 |
---|---|
author | Cui, Xiaoqi Wang, Tong Chen, Huann-Sheng Busov, Victor Wei, Hairong |
author_facet | Cui, Xiaoqi Wang, Tong Chen, Huann-Sheng Busov, Victor Wei, Hairong |
author_sort | Cui, Xiaoqi |
collection | PubMed |
description | BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (GRN), identification of the TFs involved in a biology process remains to be very challenging. In reality, the recognition of TFs for a given a biological process can be further complicated by the fact that most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation except for small conserved domains. This poses a significant challenge for identification of the exact TFs involved or ranking the importance of a set of TFs to a process of interest. Therefore, new methods for recognizing novel TFs are desperately needed. Although a plethora of methods have been developed to infer regulatory genes using microarray data, it is still rare to find the methods that use existing knowledge base in particular the validated genes known to be involved in a process to bait/guide discovery of novel TFs. Such methods can replace the sometimes-arbitrary process of selection of candidate genes for experimental validation and significantly advance our knowledge and understanding of the regulation of a process. RESULTS: We developed an automated software package called TF-finder for recognizing TFs involved in a biological process using microarray data and existing knowledge base. TF-finder contains two components, adaptive sparse canonical correlation analysis (ASCCA) and enrichment test, for TF recognition. ASCCA uses positive target genes to bait TFS from gene expression data while enrichment test examines the presence of positive TFs in the outcomes from ASCCA. Using microarray data from salt and water stress experiments, we showed TF-finder is very efficient in recognizing many important TFs involved in salt and drought tolerance as evidenced by the rediscovery of those TFs that have been experimentally validated. The efficiency of TF-finder in recognizing novel TFs was further confirmed by a thorough comparison with a method called Intersection of Coexpression (ICE). CONCLUSIONS: TF-finder can be successfully used to infer novel TFs involved a biological process of interest using publicly available gene expression data and known positive genes from existing knowledge bases. The package for TF-finder includes an R script for ASCCA, a Perl controller, and several Perl scripts for parsing intermediate outputs. The package is available upon request (hairong@mtu.edu). The R code for standalone ASCCA is also available. |
format | Text |
id | pubmed-2930629 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29306292010-09-07 TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base Cui, Xiaoqi Wang, Tong Chen, Huann-Sheng Busov, Victor Wei, Hairong BMC Bioinformatics Methodology Article BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (GRN), identification of the TFs involved in a biology process remains to be very challenging. In reality, the recognition of TFs for a given a biological process can be further complicated by the fact that most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation except for small conserved domains. This poses a significant challenge for identification of the exact TFs involved or ranking the importance of a set of TFs to a process of interest. Therefore, new methods for recognizing novel TFs are desperately needed. Although a plethora of methods have been developed to infer regulatory genes using microarray data, it is still rare to find the methods that use existing knowledge base in particular the validated genes known to be involved in a process to bait/guide discovery of novel TFs. Such methods can replace the sometimes-arbitrary process of selection of candidate genes for experimental validation and significantly advance our knowledge and understanding of the regulation of a process. RESULTS: We developed an automated software package called TF-finder for recognizing TFs involved in a biological process using microarray data and existing knowledge base. TF-finder contains two components, adaptive sparse canonical correlation analysis (ASCCA) and enrichment test, for TF recognition. ASCCA uses positive target genes to bait TFS from gene expression data while enrichment test examines the presence of positive TFs in the outcomes from ASCCA. Using microarray data from salt and water stress experiments, we showed TF-finder is very efficient in recognizing many important TFs involved in salt and drought tolerance as evidenced by the rediscovery of those TFs that have been experimentally validated. The efficiency of TF-finder in recognizing novel TFs was further confirmed by a thorough comparison with a method called Intersection of Coexpression (ICE). CONCLUSIONS: TF-finder can be successfully used to infer novel TFs involved a biological process of interest using publicly available gene expression data and known positive genes from existing knowledge bases. The package for TF-finder includes an R script for ASCCA, a Perl controller, and several Perl scripts for parsing intermediate outputs. The package is available upon request (hairong@mtu.edu). The R code for standalone ASCCA is also available. BioMed Central 2010-08-12 /pmc/articles/PMC2930629/ /pubmed/20704747 http://dx.doi.org/10.1186/1471-2105-11-425 Text en Copyright ©2010 Cui et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Cui, Xiaoqi Wang, Tong Chen, Huann-Sheng Busov, Victor Wei, Hairong TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title | TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title_full | TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title_fullStr | TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title_full_unstemmed | TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title_short | TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
title_sort | tf-finder: a software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2930629/ https://www.ncbi.nlm.nih.gov/pubmed/20704747 http://dx.doi.org/10.1186/1471-2105-11-425 |
work_keys_str_mv | AT cuixiaoqi tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase AT wangtong tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase AT chenhuannsheng tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase AT busovvictor tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase AT weihairong tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase |