Cargando…

CLEAN: CLustering Enrichment ANalysis

BACKGROUND: Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyse...

Descripción completa

Detalles Bibliográficos
Autores principales: Freudenberg, Johannes M, Joshi, Vineet K, Hu, Zhen, Medvedovic, Mario
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734555/
https://www.ncbi.nlm.nih.gov/pubmed/19640299
http://dx.doi.org/10.1186/1471-2105-10-234
_version_ 1782171157005336576
author Freudenberg, Johannes M
Joshi, Vineet K
Hu, Zhen
Medvedovic, Mario
author_facet Freudenberg, Johannes M
Joshi, Vineet K
Hu, Zhen
Medvedovic, Mario
author_sort Freudenberg, Johannes M
collection PubMed
description BACKGROUND: Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. RESULTS: We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). CONCLUSION: Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile.
format Text
id pubmed-2734555
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27345552009-08-29 CLEAN: CLustering Enrichment ANalysis Freudenberg, Johannes M Joshi, Vineet K Hu, Zhen Medvedovic, Mario BMC Bioinformatics Methodology Article BACKGROUND: Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. RESULTS: We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). CONCLUSION: Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile. BioMed Central 2009-07-29 /pmc/articles/PMC2734555/ /pubmed/19640299 http://dx.doi.org/10.1186/1471-2105-10-234 Text en Copyright © 2009 Freudenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Freudenberg, Johannes M
Joshi, Vineet K
Hu, Zhen
Medvedovic, Mario
CLEAN: CLustering Enrichment ANalysis
title CLEAN: CLustering Enrichment ANalysis
title_full CLEAN: CLustering Enrichment ANalysis
title_fullStr CLEAN: CLustering Enrichment ANalysis
title_full_unstemmed CLEAN: CLustering Enrichment ANalysis
title_short CLEAN: CLustering Enrichment ANalysis
title_sort clean: clustering enrichment analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2734555/
https://www.ncbi.nlm.nih.gov/pubmed/19640299
http://dx.doi.org/10.1186/1471-2105-10-234
work_keys_str_mv AT freudenbergjohannesm cleanclusteringenrichmentanalysis
AT joshivineetk cleanclusteringenrichmentanalysis
AT huzhen cleanclusteringenrichmentanalysis
AT medvedovicmario cleanclusteringenrichmentanalysis