Cargando…

MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data

Gene-set analysis (GSA) summarizes individual molecular measurements to more interpretable pathways or gene-sets and has become an indispensable step in the interpretation of large-scale omics data. However, GSA methods are limited to the analysis of single omics data. Here, we introduce a new compu...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Chen, Basunia, Azfar, Peters, Bjoern, Gholami, Amin Moghaddas, Kuster, Bernhard, Culhane, Aedín C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The American Society for Biochemistry and Molecular Biology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6692785/
https://www.ncbi.nlm.nih.gov/pubmed/31243065
http://dx.doi.org/10.1074/mcp.TIR118.001251
_version_ 1783443602982043648
author Meng, Chen
Basunia, Azfar
Peters, Bjoern
Gholami, Amin Moghaddas
Kuster, Bernhard
Culhane, Aedín C.
author_facet Meng, Chen
Basunia, Azfar
Peters, Bjoern
Gholami, Amin Moghaddas
Kuster, Bernhard
Culhane, Aedín C.
author_sort Meng, Chen
collection PubMed
description Gene-set analysis (GSA) summarizes individual molecular measurements to more interpretable pathways or gene-sets and has become an indispensable step in the interpretation of large-scale omics data. However, GSA methods are limited to the analysis of single omics data. Here, we introduce a new computation method termed multi-omics gene-set analysis (MOGSA), a multivariate single sample gene-set analysis method that integrates multiple experimental and molecular data types measured over the same set of samples. The method learns a low dimensional representation of most variant correlated features (genes, proteins, etc.) across multiple omics data sets, transforms the features onto the same scale and calculates an integrated gene-set score from the most informative features in each data type. MOGSA does not require filtering data to the intersection of features (gene IDs), therefore, all molecular features, including those that lack annotation may be included in the analysis. Using simulated data, we demonstrate that integrating multiple diverse sources of molecular data increases the power to discover subtle changes in gene-sets and may reduce the impact of unreliable information in any single data type. Using real experimental data, we demonstrate three use-cases of MOGSA. First, we show how to remove a source of noise (technical or biological) in integrative MOGSA of NCI60 transcriptome and proteome data. Second, we apply MOGSA to discover similarities and differences in mRNA, protein and phosphorylation profiles of a small study of stem cell lines and assess the influence of each data type or feature on the total gene-set score. Finally, we apply MOGSA to cluster analysis and show that three molecular subtypes are robustly discovered when copy number variation and mRNA data of 308 bladder cancers from The Cancer Genome Atlas are integrated using MOGSA. MOGSA is available in the Bioconductor R package “mogsa.”
format Online
Article
Text
id pubmed-6692785
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher The American Society for Biochemistry and Molecular Biology
record_format MEDLINE/PubMed
spelling pubmed-66927852019-08-15 MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data Meng, Chen Basunia, Azfar Peters, Bjoern Gholami, Amin Moghaddas Kuster, Bernhard Culhane, Aedín C. Mol Cell Proteomics Technological Innovation and Resources Gene-set analysis (GSA) summarizes individual molecular measurements to more interpretable pathways or gene-sets and has become an indispensable step in the interpretation of large-scale omics data. However, GSA methods are limited to the analysis of single omics data. Here, we introduce a new computation method termed multi-omics gene-set analysis (MOGSA), a multivariate single sample gene-set analysis method that integrates multiple experimental and molecular data types measured over the same set of samples. The method learns a low dimensional representation of most variant correlated features (genes, proteins, etc.) across multiple omics data sets, transforms the features onto the same scale and calculates an integrated gene-set score from the most informative features in each data type. MOGSA does not require filtering data to the intersection of features (gene IDs), therefore, all molecular features, including those that lack annotation may be included in the analysis. Using simulated data, we demonstrate that integrating multiple diverse sources of molecular data increases the power to discover subtle changes in gene-sets and may reduce the impact of unreliable information in any single data type. Using real experimental data, we demonstrate three use-cases of MOGSA. First, we show how to remove a source of noise (technical or biological) in integrative MOGSA of NCI60 transcriptome and proteome data. Second, we apply MOGSA to discover similarities and differences in mRNA, protein and phosphorylation profiles of a small study of stem cell lines and assess the influence of each data type or feature on the total gene-set score. Finally, we apply MOGSA to cluster analysis and show that three molecular subtypes are robustly discovered when copy number variation and mRNA data of 308 bladder cancers from The Cancer Genome Atlas are integrated using MOGSA. MOGSA is available in the Bioconductor R package “mogsa.” The American Society for Biochemistry and Molecular Biology 2019-08-09 2019-06-26 /pmc/articles/PMC6692785/ /pubmed/31243065 http://dx.doi.org/10.1074/mcp.TIR118.001251 Text en © 2019 Meng et al. Published by The American Society for Biochemistry and Molecular Biology, Inc. Author's Choice—Final version open access under the terms of the Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0) .
spellingShingle Technological Innovation and Resources
Meng, Chen
Basunia, Azfar
Peters, Bjoern
Gholami, Amin Moghaddas
Kuster, Bernhard
Culhane, Aedín C.
MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title_full MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title_fullStr MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title_full_unstemmed MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title_short MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data
title_sort mogsa: integrative single sample gene-set analysis of multiple omics data
topic Technological Innovation and Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6692785/
https://www.ncbi.nlm.nih.gov/pubmed/31243065
http://dx.doi.org/10.1074/mcp.TIR118.001251
work_keys_str_mv AT mengchen mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata
AT basuniaazfar mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata
AT petersbjoern mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata
AT gholamiaminmoghaddas mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata
AT kusterbernhard mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata
AT culhaneaedinc mogsaintegrativesinglesamplegenesetanalysisofmultipleomicsdata