Cargando…

Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis

It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignori...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Lin, Wang, Wei, Wu, Tan, Zhu, Lina, He, Lingli, Wang, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341864/
https://www.ncbi.nlm.nih.gov/pubmed/34367231
http://dx.doi.org/10.3389/fgene.2021.607817
_version_ 1783733981863215104
author Qi, Lin
Wang, Wei
Wu, Tan
Zhu, Lina
He, Lingli
Wang, Xin
author_facet Qi, Lin
Wang, Wei
Wu, Tan
Zhu, Lina
He, Lingli
Wang, Xin
author_sort Qi, Lin
collection PubMed
description It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer (n = 462), and breast cancer (n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P- values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not (P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources.
format Online
Article
Text
id pubmed-8341864
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83418642021-08-06 Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis Qi, Lin Wang, Wei Wu, Tan Zhu, Lina He, Lingli Wang, Xin Front Genet Genetics It is now clear that major malignancies are heterogeneous diseases associated with diverse molecular properties and clinical outcomes, posing a great challenge for more individualized therapy. In the last decade, cancer molecular subtyping studies were mostly based on transcriptomic profiles, ignoring heterogeneity at other (epi-)genetic levels of gene regulation. Integrating multiple types of (epi)genomic data generates a more comprehensive landscape of biological processes, providing an opportunity to better dissect cancer heterogeneity. Here, we propose sparse canonical correlation analysis for cancer classification (SCCA-CC), which projects each type of single-omics data onto a unified space for data fusion, followed by clustering and classification analysis. Without loss of generality, as case studies, we integrated two types of omics data, mRNA and miRNA profiles, for molecular classification of ovarian cancer (n = 462), and breast cancer (n = 451). The two types of omics data were projected onto a unified space using SCCA, followed by data fusion to identify cancer subtypes. The subtypes we identified recapitulated subtypes previously recognized by other groups (all P- values < 0.001), but display more significant clinical associations. Especially in ovarian cancer, the four subtypes we identified were significantly associated with overall survival, while the taxonomy previously established by TCGA did not (P- values: 0.039 vs. 0.12). The multi-omics classifiers we established can not only classify individual types of data but also demonstrated higher accuracies on the fused data. Compared with iCluster, SCCA-CC demonstrated its superiority by identifying subtypes of higher coherence, clinical relevance, and time efficiency. In conclusion, we developed an integrated bioinformatic framework SCCA-CC for cancer molecular subtyping. Using two case studies in breast and ovarian cancer, we demonstrated its effectiveness in identifying biologically meaningful and clinically relevant subtypes. SCCA-CC presented a unique advantage in its ability to classify both single-omics data and multi-omics data, which significantly extends the applicability to various data types, and making more efficient use of published omics resources. Frontiers Media S.A. 2021-07-22 /pmc/articles/PMC8341864/ /pubmed/34367231 http://dx.doi.org/10.3389/fgene.2021.607817 Text en Copyright © 2021 Qi, Wang, Wu, Zhu, He and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Qi, Lin
Wang, Wei
Wu, Tan
Zhu, Lina
He, Lingli
Wang, Xin
Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title_full Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title_fullStr Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title_full_unstemmed Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title_short Multi-Omics Data Fusion for Cancer Molecular Subtyping Using Sparse Canonical Correlation Analysis
title_sort multi-omics data fusion for cancer molecular subtyping using sparse canonical correlation analysis
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8341864/
https://www.ncbi.nlm.nih.gov/pubmed/34367231
http://dx.doi.org/10.3389/fgene.2021.607817
work_keys_str_mv AT qilin multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis
AT wangwei multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis
AT wutan multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis
AT zhulina multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis
AT helingli multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis
AT wangxin multiomicsdatafusionforcancermolecularsubtypingusingsparsecanonicalcorrelationanalysis