Cargando…

A novel deep autoencoder based survival analysis approach for microarray dataset

BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that proba...

Descripción completa

Detalles Bibliográficos
Autores principales: Torkey, Hanaa, Atlam, Mostafa, El-Fishawy, Nawal, Salem, Hanaa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080419/
https://www.ncbi.nlm.nih.gov/pubmed/33981841
http://dx.doi.org/10.7717/peerj-cs.492
_version_ 1783685421739278336
author Torkey, Hanaa
Atlam, Mostafa
El-Fishawy, Nawal
Salem, Hanaa
author_facet Torkey, Hanaa
Atlam, Mostafa
El-Fishawy, Nawal
Salem, Hanaa
author_sort Torkey, Hanaa
collection PubMed
description BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge. METHODS: In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes. RESULTS: We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability.
format Online
Article
Text
id pubmed-8080419
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-80804192021-05-11 A novel deep autoencoder based survival analysis approach for microarray dataset Torkey, Hanaa Atlam, Mostafa El-Fishawy, Nawal Salem, Hanaa PeerJ Comput Sci Bioinformatics BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge. METHODS: In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes. RESULTS: We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability. PeerJ Inc. 2021-04-21 /pmc/articles/PMC8080419/ /pubmed/33981841 http://dx.doi.org/10.7717/peerj-cs.492 Text en ©2021 Torkey et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Torkey, Hanaa
Atlam, Mostafa
El-Fishawy, Nawal
Salem, Hanaa
A novel deep autoencoder based survival analysis approach for microarray dataset
title A novel deep autoencoder based survival analysis approach for microarray dataset
title_full A novel deep autoencoder based survival analysis approach for microarray dataset
title_fullStr A novel deep autoencoder based survival analysis approach for microarray dataset
title_full_unstemmed A novel deep autoencoder based survival analysis approach for microarray dataset
title_short A novel deep autoencoder based survival analysis approach for microarray dataset
title_sort novel deep autoencoder based survival analysis approach for microarray dataset
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080419/
https://www.ncbi.nlm.nih.gov/pubmed/33981841
http://dx.doi.org/10.7717/peerj-cs.492
work_keys_str_mv AT torkeyhanaa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT atlammostafa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT elfishawynawal anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT salemhanaa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT torkeyhanaa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT atlammostafa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT elfishawynawal noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset
AT salemhanaa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset