Cargando…

Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer

Artificial intelligence-based unsupervised deep learning (DL) is widely used to mine multimodal big data. However, there are few applications of this technology to cancer genomics. We aim to develop DL models to extract deep features from the breast cancer gene expression data and copy number altera...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qian, Hu, Pingzhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6520782/
https://www.ncbi.nlm.nih.gov/pubmed/30959966
http://dx.doi.org/10.3390/cancers11040494
_version_ 1783418810548617216
author Liu, Qian
Hu, Pingzhao
author_facet Liu, Qian
Hu, Pingzhao
author_sort Liu, Qian
collection PubMed
description Artificial intelligence-based unsupervised deep learning (DL) is widely used to mine multimodal big data. However, there are few applications of this technology to cancer genomics. We aim to develop DL models to extract deep features from the breast cancer gene expression data and copy number alteration (CNA) data separately and jointly. We hypothesize that the deep features are associated with patients’ clinical characteristics and outcomes. Two unsupervised denoising autoencoders (DAs) were developed to extract deep features from TCGA (The Cancer Genome Atlas) breast cancer gene expression and CNA data separately and jointly. A heat map was used to view and cluster patients into subgroups based on these DL features. Fisher’s exact test and Pearson’ Chi-square test were applied to test the associations of patients’ groups and clinical information. Survival differences between the groups were evaluated by Kaplan–Meier (KM) curves. Associations between each of the features and patient’s overall survival were assessed using Cox’s proportional hazards (COX-PH) model and a risk score for each feature set from the different omics data sets was generated from the survival regression coefficients. The risk scores for each feature set were binarized into high- and low-risk patient groups to evaluate survival differences using KM curves. Furthermore, the risk scores were traced back to their gene level DAs weights so that the three gene lists for each of the genomic data points were generated to perform gene set enrichment analysis. Patients were clustered into two groups based on concatenated features from the gene expression and CNA data and these two groups showed different overall survival rates (p-value = 0.049) and different ER (Estrogen receptor) statuses (p-value = 0.002, OR (odds ratio) = 0.626). All the risk scores from the gene expression and CNA data and their concatenated one were significantly associated with breast cancer survival. The patients with the high-risk group were significantly associated with patients’ worse outcomes (p-values ≤ 0.0023). The concatenated risk score was enriched by the AMP-activated protein kinase (AMPK) signaling pathway, the regulation of DNA-templated transcription, the regulation of nucleic acid-templated transcription, the regulation of apoptotic process, the positive regulation of gene expression, the positive regulation of cell proliferation, heart morphogenesis, the regulation of cellular macromolecule biosynthetic process, with FDR (false discovery rate) less than 0.05. We confirmed DAs can effectively extract meaningful genomic features from genomic data and concatenating multiple data sources can improve the significance of the features associated with breast cancer patients’ clinical characteristics and outcomes.
format Online
Article
Text
id pubmed-6520782
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-65207822019-05-31 Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer Liu, Qian Hu, Pingzhao Cancers (Basel) Article Artificial intelligence-based unsupervised deep learning (DL) is widely used to mine multimodal big data. However, there are few applications of this technology to cancer genomics. We aim to develop DL models to extract deep features from the breast cancer gene expression data and copy number alteration (CNA) data separately and jointly. We hypothesize that the deep features are associated with patients’ clinical characteristics and outcomes. Two unsupervised denoising autoencoders (DAs) were developed to extract deep features from TCGA (The Cancer Genome Atlas) breast cancer gene expression and CNA data separately and jointly. A heat map was used to view and cluster patients into subgroups based on these DL features. Fisher’s exact test and Pearson’ Chi-square test were applied to test the associations of patients’ groups and clinical information. Survival differences between the groups were evaluated by Kaplan–Meier (KM) curves. Associations between each of the features and patient’s overall survival were assessed using Cox’s proportional hazards (COX-PH) model and a risk score for each feature set from the different omics data sets was generated from the survival regression coefficients. The risk scores for each feature set were binarized into high- and low-risk patient groups to evaluate survival differences using KM curves. Furthermore, the risk scores were traced back to their gene level DAs weights so that the three gene lists for each of the genomic data points were generated to perform gene set enrichment analysis. Patients were clustered into two groups based on concatenated features from the gene expression and CNA data and these two groups showed different overall survival rates (p-value = 0.049) and different ER (Estrogen receptor) statuses (p-value = 0.002, OR (odds ratio) = 0.626). All the risk scores from the gene expression and CNA data and their concatenated one were significantly associated with breast cancer survival. The patients with the high-risk group were significantly associated with patients’ worse outcomes (p-values ≤ 0.0023). The concatenated risk score was enriched by the AMP-activated protein kinase (AMPK) signaling pathway, the regulation of DNA-templated transcription, the regulation of nucleic acid-templated transcription, the regulation of apoptotic process, the positive regulation of gene expression, the positive regulation of cell proliferation, heart morphogenesis, the regulation of cellular macromolecule biosynthetic process, with FDR (false discovery rate) less than 0.05. We confirmed DAs can effectively extract meaningful genomic features from genomic data and concatenating multiple data sources can improve the significance of the features associated with breast cancer patients’ clinical characteristics and outcomes. MDPI 2019-04-07 /pmc/articles/PMC6520782/ /pubmed/30959966 http://dx.doi.org/10.3390/cancers11040494 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Qian
Hu, Pingzhao
Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title_full Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title_fullStr Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title_full_unstemmed Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title_short Association Analysis of Deep Genomic Features Extracted by Denoising Autoencoders in Breast Cancer
title_sort association analysis of deep genomic features extracted by denoising autoencoders in breast cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6520782/
https://www.ncbi.nlm.nih.gov/pubmed/30959966
http://dx.doi.org/10.3390/cancers11040494
work_keys_str_mv AT liuqian associationanalysisofdeepgenomicfeaturesextractedbydenoisingautoencodersinbreastcancer
AT hupingzhao associationanalysisofdeepgenomicfeaturesextractedbydenoisingautoencodersinbreastcancer