Cargando…

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The stru...

Descripción completa

Detalles Bibliográficos
Autores principales: Dotan-Cohen, Dikla, Kasif, Simon, Melkman, Avraham A.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705235/
https://www.ncbi.nlm.nih.gov/pubmed/19497934
http://dx.doi.org/10.1093/bioinformatics/btp327
_version_ 1782168973944553472
author Dotan-Cohen, Dikla
Kasif, Simon
Melkman, Avraham A.
author_facet Dotan-Cohen, Dikla
Kasif, Simon
Melkman, Avraham A.
author_sort Dotan-Cohen, Dikla
collection PubMed
description Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity. Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data. Contact: dotna@cs.bgu.ac.il Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2705235
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27052352009-07-06 Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering Dotan-Cohen, Dikla Kasif, Simon Melkman, Avraham A. Bioinformatics Original Paper Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity. Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data. Contact: dotna@cs.bgu.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2009-07-15 2009-06-03 /pmc/articles/PMC2705235/ /pubmed/19497934 http://dx.doi.org/10.1093/bioinformatics/btp327 Text en http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Dotan-Cohen, Dikla
Kasif, Simon
Melkman, Avraham A.
Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title_full Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title_fullStr Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title_full_unstemmed Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title_short Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering
title_sort seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705235/
https://www.ncbi.nlm.nih.gov/pubmed/19497934
http://dx.doi.org/10.1093/bioinformatics/btp327
work_keys_str_mv AT dotancohendikla seeingtheforestforthetreesusingthegeneontologytorestructurehierarchicalclustering
AT kasifsimon seeingtheforestforthetreesusingthegeneontologytorestructurehierarchicalclustering
AT melkmanavrahama seeingtheforestforthetreesusingthegeneontologytorestructurehierarchicalclustering