Cargando…

TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)

BACKGROUND: Identifying the key transcription factors (TFs) controlling a biological process is the first step toward a better understanding of underpinning regulatory mechanisms. However, due to the involvement of a large number of genes and complex interactions in gene regulatory networks, identif...

Descripción completa

Detalles Bibliográficos
Autores principales: Nie, Jeff, Stewart, Ron, Zhang, Hang, Thomson, James A, Ruan, Fang, Cui, Xiaoqi, Wei, Hairong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3101171/
https://www.ncbi.nlm.nih.gov/pubmed/21496241
http://dx.doi.org/10.1186/1752-0509-5-53
_version_ 1782204248048533504
author Nie, Jeff
Stewart, Ron
Zhang, Hang
Thomson, James A
Ruan, Fang
Cui, Xiaoqi
Wei, Hairong
author_facet Nie, Jeff
Stewart, Ron
Zhang, Hang
Thomson, James A
Ruan, Fang
Cui, Xiaoqi
Wei, Hairong
author_sort Nie, Jeff
collection PubMed
description BACKGROUND: Identifying the key transcription factors (TFs) controlling a biological process is the first step toward a better understanding of underpinning regulatory mechanisms. However, due to the involvement of a large number of genes and complex interactions in gene regulatory networks, identifying TFs involved in a biological process remains particularly difficult. The challenges include: (1) Most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation, making it difficult to recognize TFs for a biological process; (2) Transcription usually involves several hundred genes that generate a combination of intrinsic noise from upstream signaling networks and lead to fluctuations in transcription; (3) A TF can function in different cell types or developmental stages. Currently, the methods available for identifying TFs involved in biological processes are still very scarce, and the development of novel, more powerful methods is desperately needed. RESULTS: We developed a computational pipeline called TF-Cluster for identifying functionally coordinated TFs in two steps: (1) Construction of a shared coexpression connectivity matrix (SCCM), in which each entry represents the number of shared coexpressed genes between two TFs. This sparse and symmetric matrix embodies a new concept of coexpression networks in which genes are associated in the context of other shared coexpressed genes; (2) Decomposition of the SCCM using a novel heuristic algorithm termed "Triple-Link", which searches the highest connectivity in the SCCM, and then uses two connected TF as a primer for growing a TF cluster with a number of linking criteria. We applied TF-Cluster to microarray data from human stem cells and Arabidopsis roots, and then demonstrated that many of the resulting TF clusters contain functionally coordinated TFs that, based on existing literature, accurately represent a biological process of interest. CONCLUSIONS: TF-Cluster can be used to identify a set of TFs controlling a biological process of interest from gene expression data. Its high accuracy in recognizing true positive TFs involved in a biological process makes it extremely valuable in building core GRNs controlling a biological process. The pipeline implemented in Perl can be installed in various platforms.
format Text
id pubmed-3101171
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31011712011-05-25 TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM) Nie, Jeff Stewart, Ron Zhang, Hang Thomson, James A Ruan, Fang Cui, Xiaoqi Wei, Hairong BMC Syst Biol Methodology Article BACKGROUND: Identifying the key transcription factors (TFs) controlling a biological process is the first step toward a better understanding of underpinning regulatory mechanisms. However, due to the involvement of a large number of genes and complex interactions in gene regulatory networks, identifying TFs involved in a biological process remains particularly difficult. The challenges include: (1) Most eukaryotic genomes encode thousands of TFs, which are organized in gene families of various sizes and in many cases with poor sequence conservation, making it difficult to recognize TFs for a biological process; (2) Transcription usually involves several hundred genes that generate a combination of intrinsic noise from upstream signaling networks and lead to fluctuations in transcription; (3) A TF can function in different cell types or developmental stages. Currently, the methods available for identifying TFs involved in biological processes are still very scarce, and the development of novel, more powerful methods is desperately needed. RESULTS: We developed a computational pipeline called TF-Cluster for identifying functionally coordinated TFs in two steps: (1) Construction of a shared coexpression connectivity matrix (SCCM), in which each entry represents the number of shared coexpressed genes between two TFs. This sparse and symmetric matrix embodies a new concept of coexpression networks in which genes are associated in the context of other shared coexpressed genes; (2) Decomposition of the SCCM using a novel heuristic algorithm termed "Triple-Link", which searches the highest connectivity in the SCCM, and then uses two connected TF as a primer for growing a TF cluster with a number of linking criteria. We applied TF-Cluster to microarray data from human stem cells and Arabidopsis roots, and then demonstrated that many of the resulting TF clusters contain functionally coordinated TFs that, based on existing literature, accurately represent a biological process of interest. CONCLUSIONS: TF-Cluster can be used to identify a set of TFs controlling a biological process of interest from gene expression data. Its high accuracy in recognizing true positive TFs involved in a biological process makes it extremely valuable in building core GRNs controlling a biological process. The pipeline implemented in Perl can be installed in various platforms. BioMed Central 2011-04-15 /pmc/articles/PMC3101171/ /pubmed/21496241 http://dx.doi.org/10.1186/1752-0509-5-53 Text en Copyright ©2011 Nie et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Nie, Jeff
Stewart, Ron
Zhang, Hang
Thomson, James A
Ruan, Fang
Cui, Xiaoqi
Wei, Hairong
TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title_full TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title_fullStr TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title_full_unstemmed TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title_short TF-Cluster: A pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (SCCM)
title_sort tf-cluster: a pipeline for identifying functionally coordinated transcription factors via network decomposition of the shared coexpression connectivity matrix (sccm)
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3101171/
https://www.ncbi.nlm.nih.gov/pubmed/21496241
http://dx.doi.org/10.1186/1752-0509-5-53
work_keys_str_mv AT niejeff tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT stewartron tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT zhanghang tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT thomsonjamesa tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT ruanfang tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT cuixiaoqi tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm
AT weihairong tfclusterapipelineforidentifyingfunctionallycoordinatedtranscriptionfactorsvianetworkdecompositionofthesharedcoexpressionconnectivitymatrixsccm