Cargando…

Improved survival analysis by learning shared genomic information from pan-cancer data

MOTIVATION: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amo...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunkyu, Kim, Keonwoo, Choe, Junseok, Lee, Inggeol, Kang, Jaewoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355236/
https://www.ncbi.nlm.nih.gov/pubmed/32657401
http://dx.doi.org/10.1093/bioinformatics/btaa462
_version_ 1783558233879740416
author Kim, Sunkyu
Kim, Keonwoo
Choe, Junseok
Lee, Inggeol
Kang, Jaewoo
author_facet Kim, Sunkyu
Kim, Keonwoo
Choe, Junseok
Lee, Inggeol
Kang, Jaewoo
author_sort Kim, Sunkyu
collection PubMed
description MOTIVATION: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. To address the issue, we introduce a new deep-learning architecture called VAECox. VAECox uses transfer learning and fine tuning. RESULTS: We pre-trained a variational autoencoder on all RNA-seq data in 20 TCGA datasets and transferred the trained weights to our survival prediction model. Then we fine-tuned the transferred weights during training the survival model on each dataset. Results show that our model outperformed other previous models such as Cox Proportional Hazard with LASSO and ridge penalty and Cox-nnet on the 7 of 10 TCGA datasets in terms of C-index. The results signify that the transferred information obtained from entire cancer transcriptome data helped our survival prediction model reduce overfitting and show robust performance in unseen cancer patient samples. AVAILABILITY AND IMPLEMENTATION: Our implementation of VAECox is available at https://github.com/dmis-lab/VAECox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7355236
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73552362020-07-16 Improved survival analysis by learning shared genomic information from pan-cancer data Kim, Sunkyu Kim, Keonwoo Choe, Junseok Lee, Inggeol Kang, Jaewoo Bioinformatics Studies of Phenotypes and Clinical Applications MOTIVATION: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. To address the issue, we introduce a new deep-learning architecture called VAECox. VAECox uses transfer learning and fine tuning. RESULTS: We pre-trained a variational autoencoder on all RNA-seq data in 20 TCGA datasets and transferred the trained weights to our survival prediction model. Then we fine-tuned the transferred weights during training the survival model on each dataset. Results show that our model outperformed other previous models such as Cox Proportional Hazard with LASSO and ridge penalty and Cox-nnet on the 7 of 10 TCGA datasets in terms of C-index. The results signify that the transferred information obtained from entire cancer transcriptome data helped our survival prediction model reduce overfitting and show robust performance in unseen cancer patient samples. AVAILABILITY AND IMPLEMENTATION: Our implementation of VAECox is available at https://github.com/dmis-lab/VAECox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355236/ /pubmed/32657401 http://dx.doi.org/10.1093/bioinformatics/btaa462 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Studies of Phenotypes and Clinical Applications
Kim, Sunkyu
Kim, Keonwoo
Choe, Junseok
Lee, Inggeol
Kang, Jaewoo
Improved survival analysis by learning shared genomic information from pan-cancer data
title Improved survival analysis by learning shared genomic information from pan-cancer data
title_full Improved survival analysis by learning shared genomic information from pan-cancer data
title_fullStr Improved survival analysis by learning shared genomic information from pan-cancer data
title_full_unstemmed Improved survival analysis by learning shared genomic information from pan-cancer data
title_short Improved survival analysis by learning shared genomic information from pan-cancer data
title_sort improved survival analysis by learning shared genomic information from pan-cancer data
topic Studies of Phenotypes and Clinical Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355236/
https://www.ncbi.nlm.nih.gov/pubmed/32657401
http://dx.doi.org/10.1093/bioinformatics/btaa462
work_keys_str_mv AT kimsunkyu improvedsurvivalanalysisbylearningsharedgenomicinformationfrompancancerdata
AT kimkeonwoo improvedsurvivalanalysisbylearningsharedgenomicinformationfrompancancerdata
AT choejunseok improvedsurvivalanalysisbylearningsharedgenomicinformationfrompancancerdata
AT leeinggeol improvedsurvivalanalysisbylearningsharedgenomicinformationfrompancancerdata
AT kangjaewoo improvedsurvivalanalysisbylearningsharedgenomicinformationfrompancancerdata