Cargando…
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge
METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) o...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648502/ https://www.ncbi.nlm.nih.gov/pubmed/26575370 http://dx.doi.org/10.1371/journal.pone.0143196 |
_version_ | 1782401245234855936 |
---|---|
author | Wagner, Florian |
author_facet | Wagner, Florian |
author_sort | Wagner, Florian |
collection | PubMed |
description | METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. RESULTS: I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. |
format | Online Article Text |
id | pubmed-4648502 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-46485022015-11-25 GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge Wagner, Florian PLoS One Research Article METHOD: Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. RESULTS: I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. Public Library of Science 2015-11-17 /pmc/articles/PMC4648502/ /pubmed/26575370 http://dx.doi.org/10.1371/journal.pone.0143196 Text en © 2015 Florian Wagner http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Wagner, Florian GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title | GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title_full | GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title_fullStr | GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title_full_unstemmed | GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title_short | GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge |
title_sort | go-pca: an unsupervised method to explore gene expression data using prior knowledge |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4648502/ https://www.ncbi.nlm.nih.gov/pubmed/26575370 http://dx.doi.org/10.1371/journal.pone.0143196 |
work_keys_str_mv | AT wagnerflorian gopcaanunsupervisedmethodtoexploregeneexpressiondatausingpriorknowledge |