Cargando…

Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma

BACKGROUND: One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Young, Jonathan D., Cai, Chunhui, Lu, Xinghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629551/
https://www.ncbi.nlm.nih.gov/pubmed/28984190
http://dx.doi.org/10.1186/s12859-017-1798-2
_version_ 1783269065205219328
author Young, Jonathan D.
Cai, Chunhui
Lu, Xinghua
author_facet Young, Jonathan D.
Cai, Chunhui
Lu, Xinghua
author_sort Young, Jonathan D.
collection PubMed
description BACKGROUND: One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. RESULTS: Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples—leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to further investigate the disease mechanisms underlying each of these clusters. CONCLUSIONS: In summary, we show that a deep learning model can be trained to represent biologically and clinically meaningful abstractions of cancer gene expression data. Understanding what additional relationships these hidden layer abstractions have with the cancer cellular signaling system could have a significant impact on the understanding and treatment of cancer.
format Online
Article
Text
id pubmed-5629551
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56295512017-10-13 Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma Young, Jonathan D. Cai, Chunhui Lu, Xinghua BMC Bioinformatics Research BACKGROUND: One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. RESULTS: Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples—leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to further investigate the disease mechanisms underlying each of these clusters. CONCLUSIONS: In summary, we show that a deep learning model can be trained to represent biologically and clinically meaningful abstractions of cancer gene expression data. Understanding what additional relationships these hidden layer abstractions have with the cancer cellular signaling system could have a significant impact on the understanding and treatment of cancer. BioMed Central 2017-10-03 /pmc/articles/PMC5629551/ /pubmed/28984190 http://dx.doi.org/10.1186/s12859-017-1798-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Young, Jonathan D.
Cai, Chunhui
Lu, Xinghua
Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title_full Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title_fullStr Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title_full_unstemmed Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title_short Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
title_sort unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629551/
https://www.ncbi.nlm.nih.gov/pubmed/28984190
http://dx.doi.org/10.1186/s12859-017-1798-2
work_keys_str_mv AT youngjonathand unsuperviseddeeplearningrevealsprognosticallyrelevantsubtypesofglioblastoma
AT caichunhui unsuperviseddeeplearningrevealsprognosticallyrelevantsubtypesofglioblastoma
AT luxinghua unsuperviseddeeplearningrevealsprognosticallyrelevantsubtypesofglioblastoma