Cargando…

Integrating gene expression and GO classification for PCA by preclustering

BACKGROUND: Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expres...

Descripción completa

Detalles Bibliográficos
Autores principales: De Haan, Jorn R, Piek, Ester, van Schaik, Rene C, de Vlieg, Jacob, Bauerschmidt, Susanne, Buydens, Lutgarde MC, Wehrens, Ron
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860362/
https://www.ncbi.nlm.nih.gov/pubmed/20346140
http://dx.doi.org/10.1186/1471-2105-11-158
_version_ 1782180575312871424
author De Haan, Jorn R
Piek, Ester
van Schaik, Rene C
de Vlieg, Jacob
Bauerschmidt, Susanne
Buydens, Lutgarde MC
Wehrens, Ron
author_facet De Haan, Jorn R
Piek, Ester
van Schaik, Rene C
de Vlieg, Jacob
Bauerschmidt, Susanne
Buydens, Lutgarde MC
Wehrens, Ron
author_sort De Haan, Jorn R
collection PubMed
description BACKGROUND: Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present in GO classes are often heterogeneous, i.e., there are several different expression profiles within one class. As a result, important experimental findings can be obscured because the summarizing profile does not seem to be of interest. We propose to tackle this problem by finding homogeneous subclasses within GO categories: preclustering. RESULTS: Two microarray datasets are analyzed. First, a selection of genes from a well-known Saccharomyces cerevisiae dataset is used. The GO class "cell wall organization and biogenesis" is shown as a specific example. After preclustering, this term can be associated with different phases in the cell cycle, where it could not be associated with a specific phase previously. Second, a dataset of differentiation of human Mesenchymal Stem Cells (MSC) into osteoblasts is used. For this dataset results are shown in which the GO term "skeletal development" is a specific example of a heterogeneous GO class for which better associations can be made after preclustering. The Intra Cluster Correlation (ICC), a measure of cluster tightness, is applied to identify relevant clusters. CONCLUSIONS: We show that this method leads to an improved interpretability of results in Principal Component Analysis.
format Text
id pubmed-2860362
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28603622010-04-28 Integrating gene expression and GO classification for PCA by preclustering De Haan, Jorn R Piek, Ester van Schaik, Rene C de Vlieg, Jacob Bauerschmidt, Susanne Buydens, Lutgarde MC Wehrens, Ron BMC Bioinformatics Research article BACKGROUND: Gene expression data can be analyzed by summarizing groups of individual gene expression profiles based on GO annotation information. The mean expression profile per group can then be used to identify interesting GO categories in relation to the experimental settings. However, the expression profiles present in GO classes are often heterogeneous, i.e., there are several different expression profiles within one class. As a result, important experimental findings can be obscured because the summarizing profile does not seem to be of interest. We propose to tackle this problem by finding homogeneous subclasses within GO categories: preclustering. RESULTS: Two microarray datasets are analyzed. First, a selection of genes from a well-known Saccharomyces cerevisiae dataset is used. The GO class "cell wall organization and biogenesis" is shown as a specific example. After preclustering, this term can be associated with different phases in the cell cycle, where it could not be associated with a specific phase previously. Second, a dataset of differentiation of human Mesenchymal Stem Cells (MSC) into osteoblasts is used. For this dataset results are shown in which the GO term "skeletal development" is a specific example of a heterogeneous GO class for which better associations can be made after preclustering. The Intra Cluster Correlation (ICC), a measure of cluster tightness, is applied to identify relevant clusters. CONCLUSIONS: We show that this method leads to an improved interpretability of results in Principal Component Analysis. BioMed Central 2010-03-26 /pmc/articles/PMC2860362/ /pubmed/20346140 http://dx.doi.org/10.1186/1471-2105-11-158 Text en Copyright ©2010 De Haan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
De Haan, Jorn R
Piek, Ester
van Schaik, Rene C
de Vlieg, Jacob
Bauerschmidt, Susanne
Buydens, Lutgarde MC
Wehrens, Ron
Integrating gene expression and GO classification for PCA by preclustering
title Integrating gene expression and GO classification for PCA by preclustering
title_full Integrating gene expression and GO classification for PCA by preclustering
title_fullStr Integrating gene expression and GO classification for PCA by preclustering
title_full_unstemmed Integrating gene expression and GO classification for PCA by preclustering
title_short Integrating gene expression and GO classification for PCA by preclustering
title_sort integrating gene expression and go classification for pca by preclustering
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860362/
https://www.ncbi.nlm.nih.gov/pubmed/20346140
http://dx.doi.org/10.1186/1471-2105-11-158
work_keys_str_mv AT dehaanjornr integratinggeneexpressionandgoclassificationforpcabypreclustering
AT piekester integratinggeneexpressionandgoclassificationforpcabypreclustering
AT vanschaikrenec integratinggeneexpressionandgoclassificationforpcabypreclustering
AT devliegjacob integratinggeneexpressionandgoclassificationforpcabypreclustering
AT bauerschmidtsusanne integratinggeneexpressionandgoclassificationforpcabypreclustering
AT buydenslutgardemc integratinggeneexpressionandgoclassificationforpcabypreclustering
AT wehrensron integratinggeneexpressionandgoclassificationforpcabypreclustering