Cargando…

GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network

BACKGROUND: Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overl...

Descripción completa

Detalles Bibliográficos
Autores principales: Jacobson, Dan, Emerton, Guy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626710/
https://www.ncbi.nlm.nih.gov/pubmed/22876834
http://dx.doi.org/10.1186/1471-2105-13-197
_version_ 1782266236257697792
author Jacobson, Dan
Emerton, Guy
author_facet Jacobson, Dan
Emerton, Guy
author_sort Jacobson, Dan
collection PubMed
description BACKGROUND: Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state-of-the-art pathway-based sets. RESULTS: The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. The semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. CONCLUSIONS: The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods.
format Online
Article
Text
id pubmed-3626710
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36267102013-04-24 GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network Jacobson, Dan Emerton, Guy BMC Bioinformatics Research Article BACKGROUND: Gene Set Analysis (GSA) has proven to be a useful approach to microarray analysis. However, most of the method development for GSA has focused on the statistical tests to be used rather than on the generation of sets that will be tested. Existing methods of set generation are often overly simplistic. The creation of sets from individual pathways (in isolation) is a poor reflection of the complexity of the underlying metabolic network. We have developed a novel approach to set generation via the use of Principal Component Analysis of the Laplacian matrix of a metabolic network. We have analysed a relatively simple data set to show the difference in results between our method and the current state-of-the-art pathway-based sets. RESULTS: The sets generated with this method are semi-exhaustive and capture much of the topological complexity of the metabolic network. The semi-exhaustive nature of this method has also allowed us to design a hypergeometric enrichment test to determine which genes are likely responsible for set significance. We show that our method finds significant aspects of biology that would be missed (i.e. false negatives) and addresses the false positive rates found with the use of simple pathway-based sets. CONCLUSIONS: The set generation step for GSA is often neglected but is a crucial part of the analysis as it defines the full context for the analysis. As such, set generation methods should be robust and yield as complete a representation of the extant biological knowledge as possible. The method reported here achieves this goal and is demonstrably superior to previous set analysis methods. BioMed Central 2012-08-09 /pmc/articles/PMC3626710/ /pubmed/22876834 http://dx.doi.org/10.1186/1471-2105-13-197 Text en Copyright © 2012 Jacobson and Emerton; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jacobson, Dan
Emerton, Guy
GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title_full GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title_fullStr GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title_full_unstemmed GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title_short GSA-PCA: gene set generation by principal component analysis of the Laplacian matrix of a metabolic network
title_sort gsa-pca: gene set generation by principal component analysis of the laplacian matrix of a metabolic network
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3626710/
https://www.ncbi.nlm.nih.gov/pubmed/22876834
http://dx.doi.org/10.1186/1471-2105-13-197
work_keys_str_mv AT jacobsondan gsapcagenesetgenerationbyprincipalcomponentanalysisofthelaplacianmatrixofametabolicnetwork
AT emertonguy gsapcagenesetgenerationbyprincipalcomponentanalysisofthelaplacianmatrixofametabolicnetwork