Cargando…
A novel deep autoencoder based survival analysis approach for microarray dataset
BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that proba...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080419/ https://www.ncbi.nlm.nih.gov/pubmed/33981841 http://dx.doi.org/10.7717/peerj-cs.492 |
_version_ | 1783685421739278336 |
---|---|
author | Torkey, Hanaa Atlam, Mostafa El-Fishawy, Nawal Salem, Hanaa |
author_facet | Torkey, Hanaa Atlam, Mostafa El-Fishawy, Nawal Salem, Hanaa |
author_sort | Torkey, Hanaa |
collection | PubMed |
description | BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge. METHODS: In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes. RESULTS: We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability. |
format | Online Article Text |
id | pubmed-8080419 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80804192021-05-11 A novel deep autoencoder based survival analysis approach for microarray dataset Torkey, Hanaa Atlam, Mostafa El-Fishawy, Nawal Salem, Hanaa PeerJ Comput Sci Bioinformatics BACKGROUND: Breast cancer is one of the major causes of mortality globally. Therefore, different Machine Learning (ML) techniques were deployed for computing survival and diagnosis. Survival analysis methods are used to compute survival probability and the most important factors affecting that probability. Most survival analysis methods are used to deal with clinical features (up to hundreds), hence applying survival analysis methods like cox regression on RNAseq microarray data with many features (up to thousands) is considered a major challenge. METHODS: In this paper, a novel approach applying autoencoder to reduce the number of features is proposed. Our approach works on features reconstruction, and removal of noise within the data and features with zero variance across the samples, which facilitates extraction of features with the highest variances (across the samples) that most influence the survival probabilities. Then, it estimates the survival probability for each patient by applying random survival forests and cox regression. Applying the autoencoder on thousands of features takes a long time, thus our model is applied to the Graphical Processing Unit (GPU) in order to speed up the process. Finally, the model is evaluated and compared with the existing models on three different datasets in terms of run time, concordance index, and calibration curve, and the most related genes to survival are discovered. Finally, the biological pathways and GO molecular functions are analyzed for these significant genes. RESULTS: We fine-tuned our autoencoder model on RNA-seq data of three datasets to train the weights in our survival prediction model, then using different samples in each dataset for testing the model. The results show that the proposed AutoCox and AutoRandom algorithms based on our feature selection autoencoder approach have better concordance index results comparing the most recent deep learning approaches when applied to each dataset. Each gene resulting from our autoencoder model weight is computed. The weights show the degree of effect for each gene upon the survival probability. For instance, four of the most survival-related experimentally validated genes are on the top of our discovered genes weights list, including PTPRG, MYST1, BG683264, and AK094562 for the breast cancer gene expression dataset. Our approach improves the survival analysis in terms of speeding up the process, enhancing the prediction accuracy, and reducing the error rate in the estimated survival probability. PeerJ Inc. 2021-04-21 /pmc/articles/PMC8080419/ /pubmed/33981841 http://dx.doi.org/10.7717/peerj-cs.492 Text en ©2021 Torkey et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Torkey, Hanaa Atlam, Mostafa El-Fishawy, Nawal Salem, Hanaa A novel deep autoencoder based survival analysis approach for microarray dataset |
title | A novel deep autoencoder based survival analysis approach for microarray dataset |
title_full | A novel deep autoencoder based survival analysis approach for microarray dataset |
title_fullStr | A novel deep autoencoder based survival analysis approach for microarray dataset |
title_full_unstemmed | A novel deep autoencoder based survival analysis approach for microarray dataset |
title_short | A novel deep autoencoder based survival analysis approach for microarray dataset |
title_sort | novel deep autoencoder based survival analysis approach for microarray dataset |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080419/ https://www.ncbi.nlm.nih.gov/pubmed/33981841 http://dx.doi.org/10.7717/peerj-cs.492 |
work_keys_str_mv | AT torkeyhanaa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT atlammostafa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT elfishawynawal anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT salemhanaa anoveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT torkeyhanaa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT atlammostafa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT elfishawynawal noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset AT salemhanaa noveldeepautoencoderbasedsurvivalanalysisapproachformicroarraydataset |