Cargando…

Comparison and evaluation of pathway-level aggregation methods of gene expression data

BACKGROUND: Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional...

Descripción completa

Detalles Bibliográficos
Autor principal: Hwang, Seungwoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521227/
https://www.ncbi.nlm.nih.gov/pubmed/23282027
http://dx.doi.org/10.1186/1471-2164-13-S7-S26
_version_ 1782252909633732608
author Hwang, Seungwoo
author_facet Hwang, Seungwoo
author_sort Hwang, Seungwoo
collection PubMed
description BACKGROUND: Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional enrichment analysis such as over-representation analysis and gene set enrichment analysis, an alternative approach has also been explored. In this approach, gene expression data are first aggregated at pathway level to transform the original data into a compact representation in which each row corresponds to a pathway instead of a gene. Thereafter the pathway expression data can be used for differential expression and classification analyses in pathway space, leveraging existing algorithms usually applied to gene expression data. While several studies have proposed the pathway-level aggregation methods, it remains unclear how they compare with one another, since the evaluations were done to a limited extent. Thus this study presents a comprehensive evaluation of six most prominent aggregation methods. RESULTS: The compared methods include five existing methods--mean of all member genes (Mean all), mean of condition-responsive genes (Mean CORGs), analysis of sample set enrichment scores (ASSESS), principal component analysis (PCA), and partial least squares (PLS)--and a variant of an existing method (Mean top 50%, averaging top half of member genes). Comprehensive and stringent benchmarking was performed by collecting seven pairs of related but independent datasets encompassing various phenotypes. Aggregation was done in the space of KEGG pathways. Performance of the methods was assessed by classification accuracy validated both internally and externally, and by examining the correlative extent of pathway signatures between the dataset pairs. The assessment revealed that (i) the best accuracy and correlation were obtained from ASSESS and Mean top 50%, (ii) Mean all showed the lowest accuracy, and (iii) Mean CORGs and PLS gave rise to the largest extent of discordance in the pathway signature correlation. CONCLUSIONS: The two best performing method (ASSESS and Mean top 50%) are suggested to be preferred. The benchmarking analysis also suggests that there is both room and necessity for developing a novel method for pathway-level aggregation.
format Online
Article
Text
id pubmed-3521227
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35212272012-12-14 Comparison and evaluation of pathway-level aggregation methods of gene expression data Hwang, Seungwoo BMC Genomics Proceedings BACKGROUND: Microarray experiments produce expression measurements in genomic scale. A way to derive functional understanding of the data is to focus on functional sets of genes, such as pathways, instead of individual genes. While a common practice for the pathway-level analysis has been functional enrichment analysis such as over-representation analysis and gene set enrichment analysis, an alternative approach has also been explored. In this approach, gene expression data are first aggregated at pathway level to transform the original data into a compact representation in which each row corresponds to a pathway instead of a gene. Thereafter the pathway expression data can be used for differential expression and classification analyses in pathway space, leveraging existing algorithms usually applied to gene expression data. While several studies have proposed the pathway-level aggregation methods, it remains unclear how they compare with one another, since the evaluations were done to a limited extent. Thus this study presents a comprehensive evaluation of six most prominent aggregation methods. RESULTS: The compared methods include five existing methods--mean of all member genes (Mean all), mean of condition-responsive genes (Mean CORGs), analysis of sample set enrichment scores (ASSESS), principal component analysis (PCA), and partial least squares (PLS)--and a variant of an existing method (Mean top 50%, averaging top half of member genes). Comprehensive and stringent benchmarking was performed by collecting seven pairs of related but independent datasets encompassing various phenotypes. Aggregation was done in the space of KEGG pathways. Performance of the methods was assessed by classification accuracy validated both internally and externally, and by examining the correlative extent of pathway signatures between the dataset pairs. The assessment revealed that (i) the best accuracy and correlation were obtained from ASSESS and Mean top 50%, (ii) Mean all showed the lowest accuracy, and (iii) Mean CORGs and PLS gave rise to the largest extent of discordance in the pathway signature correlation. CONCLUSIONS: The two best performing method (ASSESS and Mean top 50%) are suggested to be preferred. The benchmarking analysis also suggests that there is both room and necessity for developing a novel method for pathway-level aggregation. BioMed Central 2012-12-07 /pmc/articles/PMC3521227/ /pubmed/23282027 http://dx.doi.org/10.1186/1471-2164-13-S7-S26 Text en Copyright ©2012 Hwang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Hwang, Seungwoo
Comparison and evaluation of pathway-level aggregation methods of gene expression data
title Comparison and evaluation of pathway-level aggregation methods of gene expression data
title_full Comparison and evaluation of pathway-level aggregation methods of gene expression data
title_fullStr Comparison and evaluation of pathway-level aggregation methods of gene expression data
title_full_unstemmed Comparison and evaluation of pathway-level aggregation methods of gene expression data
title_short Comparison and evaluation of pathway-level aggregation methods of gene expression data
title_sort comparison and evaluation of pathway-level aggregation methods of gene expression data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521227/
https://www.ncbi.nlm.nih.gov/pubmed/23282027
http://dx.doi.org/10.1186/1471-2164-13-S7-S26
work_keys_str_mv AT hwangseungwoo comparisonandevaluationofpathwaylevelaggregationmethodsofgeneexpressiondata