Cargando…

Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification

The multiple sources of cancer determine its multiple causes, and the same cancer can be composed of many different subtypes. Identification of cancer subtypes is a key part of personalized cancer treatment and provides an important reference for clinical diagnosis and treatment. Some studies have s...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Jie, Jiang, Limin, Li, Shuhao, Tang, Jijun, Wen, Lan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7969795/
https://www.ncbi.nlm.nih.gov/pubmed/33747053
http://dx.doi.org/10.3389/fgene.2021.647141
_version_ 1783666300104474624
author Feng, Jie
Jiang, Limin
Li, Shuhao
Tang, Jijun
Wen, Lan
author_facet Feng, Jie
Jiang, Limin
Li, Shuhao
Tang, Jijun
Wen, Lan
author_sort Feng, Jie
collection PubMed
description The multiple sources of cancer determine its multiple causes, and the same cancer can be composed of many different subtypes. Identification of cancer subtypes is a key part of personalized cancer treatment and provides an important reference for clinical diagnosis and treatment. Some studies have shown that there are significant differences in the genetic and epigenetic profiles among different cancer subtypes during carcinogenesis and development. In this study, we first collect seven cancer datasets from the Broad Institute GDAC Firehose, including gene expression profile, isoform expression profile, DNA methylation expression data, and survival information correspondingly. Furthermore, we employ kernel principal component analysis (PCA) to extract features for each expression profile, convert them into three similarity kernel matrices by Gaussian kernel function, and then fuse these matrices as a global kernel matrix. Finally, we apply it to spectral clustering algorithm to get the clustering results of different cancer subtypes. In the experimental results, besides using the P-value from the Cox regression model and survival analysis as the primary evaluation measures, we also introduce statistical indicators such as Rand index (RI) and adjusted RI (ARI) to verify the performance of clustering. Then combining with gene expression profile, we obtain the differential expression of genes among different subtypes by gene set enrichment analysis. For lung cancer, GMPS, EPHA10, C10orf54, and MAGEA6 are highly expressed in different subtypes; for liver cancer, CMYA5, DEPDC6, FAU, VPS24, RCBTB2, LOC100133469, and SLC35B4 are significantly expressed in different subtypes.
format Online
Article
Text
id pubmed-7969795
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79697952021-03-19 Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification Feng, Jie Jiang, Limin Li, Shuhao Tang, Jijun Wen, Lan Front Genet Genetics The multiple sources of cancer determine its multiple causes, and the same cancer can be composed of many different subtypes. Identification of cancer subtypes is a key part of personalized cancer treatment and provides an important reference for clinical diagnosis and treatment. Some studies have shown that there are significant differences in the genetic and epigenetic profiles among different cancer subtypes during carcinogenesis and development. In this study, we first collect seven cancer datasets from the Broad Institute GDAC Firehose, including gene expression profile, isoform expression profile, DNA methylation expression data, and survival information correspondingly. Furthermore, we employ kernel principal component analysis (PCA) to extract features for each expression profile, convert them into three similarity kernel matrices by Gaussian kernel function, and then fuse these matrices as a global kernel matrix. Finally, we apply it to spectral clustering algorithm to get the clustering results of different cancer subtypes. In the experimental results, besides using the P-value from the Cox regression model and survival analysis as the primary evaluation measures, we also introduce statistical indicators such as Rand index (RI) and adjusted RI (ARI) to verify the performance of clustering. Then combining with gene expression profile, we obtain the differential expression of genes among different subtypes by gene set enrichment analysis. For lung cancer, GMPS, EPHA10, C10orf54, and MAGEA6 are highly expressed in different subtypes; for liver cancer, CMYA5, DEPDC6, FAU, VPS24, RCBTB2, LOC100133469, and SLC35B4 are significantly expressed in different subtypes. Frontiers Media S.A. 2021-03-04 /pmc/articles/PMC7969795/ /pubmed/33747053 http://dx.doi.org/10.3389/fgene.2021.647141 Text en Copyright © 2021 Feng, Jiang, Li, Tang and Wen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Feng, Jie
Jiang, Limin
Li, Shuhao
Tang, Jijun
Wen, Lan
Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title_full Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title_fullStr Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title_full_unstemmed Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title_short Multi-Omics Data Fusion via a Joint Kernel Learning Model for Cancer Subtype Discovery and Essential Gene Identification
title_sort multi-omics data fusion via a joint kernel learning model for cancer subtype discovery and essential gene identification
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7969795/
https://www.ncbi.nlm.nih.gov/pubmed/33747053
http://dx.doi.org/10.3389/fgene.2021.647141
work_keys_str_mv AT fengjie multiomicsdatafusionviaajointkernellearningmodelforcancersubtypediscoveryandessentialgeneidentification
AT jianglimin multiomicsdatafusionviaajointkernellearningmodelforcancersubtypediscoveryandessentialgeneidentification
AT lishuhao multiomicsdatafusionviaajointkernellearningmodelforcancersubtypediscoveryandessentialgeneidentification
AT tangjijun multiomicsdatafusionviaajointkernellearningmodelforcancersubtypediscoveryandessentialgeneidentification
AT wenlan multiomicsdatafusionviaajointkernellearningmodelforcancersubtypediscoveryandessentialgeneidentification