Cargando…

Benchmarking variational AutoEncoders on cancer transcriptomics data

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtypi...

Descripción completa

Detalles Bibliográficos
Autores principales: Eltager, Mostafa, Abdelaal, Tamim, Charrout, Mohammed, Mahfouz, Ahmed, Reinders, Marcel J. T., Makrodimitris, Stavros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10553230/
https://www.ncbi.nlm.nih.gov/pubmed/37796856
http://dx.doi.org/10.1371/journal.pone.0292126
_version_ 1785116120422809600
author Eltager, Mostafa
Abdelaal, Tamim
Charrout, Mohammed
Mahfouz, Ahmed
Reinders, Marcel J. T.
Makrodimitris, Stavros
author_facet Eltager, Mostafa
Abdelaal, Tamim
Charrout, Mohammed
Mahfouz, Ahmed
Reinders, Marcel J. T.
Makrodimitris, Stavros
author_sort Eltager, Mostafa
collection PubMed
description Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement.
format Online
Article
Text
id pubmed-10553230
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-105532302023-10-06 Benchmarking variational AutoEncoders on cancer transcriptomics data Eltager, Mostafa Abdelaal, Tamim Charrout, Mohammed Mahfouz, Ahmed Reinders, Marcel J. T. Makrodimitris, Stavros PLoS One Research Article Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement. Public Library of Science 2023-10-05 /pmc/articles/PMC10553230/ /pubmed/37796856 http://dx.doi.org/10.1371/journal.pone.0292126 Text en © 2023 Eltager et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Eltager, Mostafa
Abdelaal, Tamim
Charrout, Mohammed
Mahfouz, Ahmed
Reinders, Marcel J. T.
Makrodimitris, Stavros
Benchmarking variational AutoEncoders on cancer transcriptomics data
title Benchmarking variational AutoEncoders on cancer transcriptomics data
title_full Benchmarking variational AutoEncoders on cancer transcriptomics data
title_fullStr Benchmarking variational AutoEncoders on cancer transcriptomics data
title_full_unstemmed Benchmarking variational AutoEncoders on cancer transcriptomics data
title_short Benchmarking variational AutoEncoders on cancer transcriptomics data
title_sort benchmarking variational autoencoders on cancer transcriptomics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10553230/
https://www.ncbi.nlm.nih.gov/pubmed/37796856
http://dx.doi.org/10.1371/journal.pone.0292126
work_keys_str_mv AT eltagermostafa benchmarkingvariationalautoencodersoncancertranscriptomicsdata
AT abdelaaltamim benchmarkingvariationalautoencodersoncancertranscriptomicsdata
AT charroutmohammed benchmarkingvariationalautoencodersoncancertranscriptomicsdata
AT mahfouzahmed benchmarkingvariationalautoencodersoncancertranscriptomicsdata
AT reindersmarceljt benchmarkingvariationalautoencodersoncancertranscriptomicsdata
AT makrodimitrisstavros benchmarkingvariationalautoencodersoncancertranscriptomicsdata