Cargando…

MGSEA – a multivariate Gene set enrichment analysis

BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are...

Descripción completa

Detalles Bibliográficos
Autores principales: Tiong, Khong-Loon, Yeang, Chen-Hsiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6421703/
https://www.ncbi.nlm.nih.gov/pubmed/30885118
http://dx.doi.org/10.1186/s12859-019-2716-6
_version_ 1783404278326493184
author Tiong, Khong-Loon
Yeang, Chen-Hsiang
author_facet Tiong, Khong-Loon
Yeang, Chen-Hsiang
author_sort Tiong, Khong-Loon
collection PubMed
description BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms. RESULTS: We propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method. CONCLUSIONS: We demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2716-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6421703
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64217032019-03-28 MGSEA – a multivariate Gene set enrichment analysis Tiong, Khong-Loon Yeang, Chen-Hsiang BMC Bioinformatics Methodology Article BACKGROUND: Gene Set Enrichment Analysis (GSEA) is a powerful tool to identify enriched functional categories of informative biomarkers. Canonical GSEA takes one-dimensional feature scores derived from the data of one platform as inputs. Numerous extensions of GSEA handling multimodal OMIC data are proposed, yet none of them explicitly captures combinatorial relations of feature scores from multiple platforms. RESULTS: We propose multivariate GSEA (MGSEA) to capture combinatorial relations of gene set enrichment among multiple platform features. MGSEA successfully captures designed feature relations from simulated data. By applying it to the scores of delineating breast cancer and glioblastoma multiforme (GBM) subtypes from The Cancer Genome Atlas (TCGA) datasets of CNV, DNA methylation and mRNA expressions, we find that breast cancer and GBM data yield both similar and distinct outcomes. Among the enriched functional categories, subtype-specific biomarkers are dominated by mRNA expression in many functional categories in both cancer types and also by CNV in many functional categories in breast cancer. The enriched functional categories belonging to distinct combinatorial patterns are involved different oncogenic processes: cell proliferation (such as cell cycle control, estrogen responses, MYC and E2F targets) for mRNA expression in breast cancer, invasion and metastasis (such as cell adhesion and epithelial-mesenchymal transition (EMT)) for CNV in breast cancer, and diverse processes (such as immune and inflammatory responses, cell adhesion, angiogenesis, and EMT) for mRNA expression in GBM. These observations persist in two external datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) for breast cancer and Repository for Molecular Brain Neoplasia Data (REMBRANDT) for GBM) and are consistent with knowledge of cancer subtypes. We further compare the characteristics of MGSEA with several extensions of GSEA and point out the pros and cons of each method. CONCLUSIONS: We demonstrated the utility of MGSEA by inferring the combinatorial relations of multiple platforms for cancer subtype delineation in three multi-OMIC datasets: TCGA, METABRIC and REMBRANDT. The inferred combinatorial patterns are consistent with the current knowledge and also reveal novel insights about cancer subtypes. MGSEA can be further applied to any genotype-phenotype association problems with multimodal OMIC data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2716-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-18 /pmc/articles/PMC6421703/ /pubmed/30885118 http://dx.doi.org/10.1186/s12859-019-2716-6 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Tiong, Khong-Loon
Yeang, Chen-Hsiang
MGSEA – a multivariate Gene set enrichment analysis
title MGSEA – a multivariate Gene set enrichment analysis
title_full MGSEA – a multivariate Gene set enrichment analysis
title_fullStr MGSEA – a multivariate Gene set enrichment analysis
title_full_unstemmed MGSEA – a multivariate Gene set enrichment analysis
title_short MGSEA – a multivariate Gene set enrichment analysis
title_sort mgsea – a multivariate gene set enrichment analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6421703/
https://www.ncbi.nlm.nih.gov/pubmed/30885118
http://dx.doi.org/10.1186/s12859-019-2716-6
work_keys_str_mv AT tiongkhongloon mgseaamultivariategenesetenrichmentanalysis
AT yeangchenhsiang mgseaamultivariategenesetenrichmentanalysis