Cargando…

Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers

Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can...

Descripción completa

Detalles Bibliográficos
Autores principales: Al Mamun, Abdullah, Tanvir, Raihanul Bari, Sobhan, Masrur, Mathee, Kalai, Narasimhan, Giri, Holt, Gregory E., Mondal, Ananda Mohan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8584911/
https://www.ncbi.nlm.nih.gov/pubmed/34769351
http://dx.doi.org/10.3390/ijms222111919
_version_ 1784597563090927616
author Al Mamun, Abdullah
Tanvir, Raihanul Bari
Sobhan, Masrur
Mathee, Kalai
Narasimhan, Giri
Holt, Gregory E.
Mondal, Ananda Mohan
author_facet Al Mamun, Abdullah
Tanvir, Raihanul Bari
Sobhan, Masrur
Mathee, Kalai
Narasimhan, Giri
Holt, Gregory E.
Mondal, Ananda Mohan
author_sort Al Mamun, Abdullah
collection PubMed
description Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.
format Online
Article
Text
id pubmed-8584911
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85849112021-11-12 Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers Al Mamun, Abdullah Tanvir, Raihanul Bari Sobhan, Masrur Mathee, Kalai Narasimhan, Giri Holt, Gregory E. Mondal, Ananda Mohan Int J Mol Sci Article Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers. MDPI 2021-11-03 /pmc/articles/PMC8584911/ /pubmed/34769351 http://dx.doi.org/10.3390/ijms222111919 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Al Mamun, Abdullah
Tanvir, Raihanul Bari
Sobhan, Masrur
Mathee, Kalai
Narasimhan, Giri
Holt, Gregory E.
Mondal, Ananda Mohan
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_full Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_fullStr Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_full_unstemmed Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_short Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_sort multi-run concrete autoencoder to identify prognostic lncrnas for 12 cancers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8584911/
https://www.ncbi.nlm.nih.gov/pubmed/34769351
http://dx.doi.org/10.3390/ijms222111919
work_keys_str_mv AT almamunabdullah multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT tanvirraihanulbari multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT sobhanmasrur multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT matheekalai multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT narasimhangiri multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT holtgregorye multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT mondalanandamohan multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers