Cargando…

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) o...

Descripción completa

Detalles Bibliográficos
Autor principal: Wagner, Florian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648502/
https://www.ncbi.nlm.nih.gov/pubmed/26575370
http://dx.doi.org/10.1371/journal.pone.0143196
_version_ 1782401245234855936
author Wagner, Florian
author_facet Wagner, Florian
author_sort Wagner, Florian
collection PubMed
description METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. RESULTS: I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
format Online
Article
Text
id pubmed-4648502
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46485022015-11-25 GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge Wagner, Florian PLoS One Research Article METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. RESULTS: I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. Public Library of Science 2015-11-17 /pmc/articles/PMC4648502/ /pubmed/26575370 http://dx.doi.org/10.1371/journal.pone.0143196 Text en © 2015 Florian Wagner http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wagner, Florian
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title_full GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title_fullStr GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title_full_unstemmed GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title_short GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
title_sort go-pca: an unsupervised method to explore gene expression data using prior knowledge
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648502/
https://www.ncbi.nlm.nih.gov/pubmed/26575370
http://dx.doi.org/10.1371/journal.pone.0143196
work_keys_str_mv AT wagnerflorian gopcaanunsupervisedmethodtoexploregeneexpressiondatausingpriorknowledge