Cargando…

TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base

BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (G...

Descripción completa

Detalles Bibliográficos
Autores principales: Cui, Xiaoqi, Wang, Tong, Chen, Huann-Sheng, Busov, Victor, Wei, Hairong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2930629/
https://www.ncbi.nlm.nih.gov/pubmed/20704747
http://dx.doi.org/10.1186/1471-2105-11-425
_version_ 1782185989527044096
author Cui, Xiaoqi
Wang, Tong
Chen, Huann-Sheng
Busov, Victor
Wei, Hairong
author_facet Cui, Xiaoqi
Wang, Tong
Chen, Huann-Sheng
Busov, Victor
Wei, Hairong
author_sort Cui, Xiaoqi
collection PubMed
description BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (GRN), identification of the TFs involved in a biology process remains to be very challenging. In reality, the recognition of TFs for a given a biological process can be further complicated by the fact that most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation except for small conserved domains. This poses a significant challenge for identification of the exact TFs involved or ranking the importance of a set of TFs to a process of interest. Therefore, new methods for recognizing novel TFs are desperately needed. Although a plethora of methods have been developed to infer regulatory genes using microarray data, it is still rare to find the methods that use existing knowledge base in particular the validated genes known to be involved in a process to bait/guide discovery of novel TFs. Such methods can replace the sometimes-arbitrary process of selection of candidate genes for experimental validation and significantly advance our knowledge and understanding of the regulation of a process. RESULTS: We developed an automated software package called TF-finder for recognizing TFs involved in a biological process using microarray data and existing knowledge base. TF-finder contains two components, adaptive sparse canonical correlation analysis (ASCCA) and enrichment test, for TF recognition. ASCCA uses positive target genes to bait TFS from gene expression data while enrichment test examines the presence of positive TFs in the outcomes from ASCCA. Using microarray data from salt and water stress experiments, we showed TF-finder is very efficient in recognizing many important TFs involved in salt and drought tolerance as evidenced by the rediscovery of those TFs that have been experimentally validated. The efficiency of TF-finder in recognizing novel TFs was further confirmed by a thorough comparison with a method called Intersection of Coexpression (ICE). CONCLUSIONS: TF-finder can be successfully used to infer novel TFs involved a biological process of interest using publicly available gene expression data and known positive genes from existing knowledge bases. The package for TF-finder includes an R script for ASCCA, a Perl controller, and several Perl scripts for parsing intermediate outputs. The package is available upon request (hairong@mtu.edu). The R code for standalone ASCCA is also available.
format Text
id pubmed-2930629
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29306292010-09-07 TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base Cui, Xiaoqi Wang, Tong Chen, Huann-Sheng Busov, Victor Wei, Hairong BMC Bioinformatics Methodology Article BACKGROUND: Identification of transcription factors (TFs) involved in a biological process is the first step towards a better understanding of the underlying regulatory mechanisms. However, due to the involvement of a large number of genes and complicated interactions in a gene regulatory network (GRN), identification of the TFs involved in a biology process remains to be very challenging. In reality, the recognition of TFs for a given a biological process can be further complicated by the fact that most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation except for small conserved domains. This poses a significant challenge for identification of the exact TFs involved or ranking the importance of a set of TFs to a process of interest. Therefore, new methods for recognizing novel TFs are desperately needed. Although a plethora of methods have been developed to infer regulatory genes using microarray data, it is still rare to find the methods that use existing knowledge base in particular the validated genes known to be involved in a process to bait/guide discovery of novel TFs. Such methods can replace the sometimes-arbitrary process of selection of candidate genes for experimental validation and significantly advance our knowledge and understanding of the regulation of a process. RESULTS: We developed an automated software package called TF-finder for recognizing TFs involved in a biological process using microarray data and existing knowledge base. TF-finder contains two components, adaptive sparse canonical correlation analysis (ASCCA) and enrichment test, for TF recognition. ASCCA uses positive target genes to bait TFS from gene expression data while enrichment test examines the presence of positive TFs in the outcomes from ASCCA. Using microarray data from salt and water stress experiments, we showed TF-finder is very efficient in recognizing many important TFs involved in salt and drought tolerance as evidenced by the rediscovery of those TFs that have been experimentally validated. The efficiency of TF-finder in recognizing novel TFs was further confirmed by a thorough comparison with a method called Intersection of Coexpression (ICE). CONCLUSIONS: TF-finder can be successfully used to infer novel TFs involved a biological process of interest using publicly available gene expression data and known positive genes from existing knowledge bases. The package for TF-finder includes an R script for ASCCA, a Perl controller, and several Perl scripts for parsing intermediate outputs. The package is available upon request (hairong@mtu.edu). The R code for standalone ASCCA is also available. BioMed Central 2010-08-12 /pmc/articles/PMC2930629/ /pubmed/20704747 http://dx.doi.org/10.1186/1471-2105-11-425 Text en Copyright ©2010 Cui et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cui, Xiaoqi
Wang, Tong
Chen, Huann-Sheng
Busov, Victor
Wei, Hairong
TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title_full TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title_fullStr TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title_full_unstemmed TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title_short TF-finder: A software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
title_sort tf-finder: a software package for identifying transcription factors involved in biological processes using microarray data and existing knowledge base
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2930629/
https://www.ncbi.nlm.nih.gov/pubmed/20704747
http://dx.doi.org/10.1186/1471-2105-11-425
work_keys_str_mv AT cuixiaoqi tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase
AT wangtong tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase
AT chenhuannsheng tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase
AT busovvictor tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase
AT weihairong tffinderasoftwarepackageforidentifyingtranscriptionfactorsinvolvedinbiologicalprocessesusingmicroarraydataandexistingknowledgebase