Cargando…

RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data

BACKGROUND: In the current genomic era, gene expression datasets have become one of the main tools utilized in cancer classification. Both curse of dimensionality and class imbalance problems are inherent characteristics of these datasets. These characteristics have a negative impact on the performa...

Descripción completa

Detalles Bibliográficos
Autores principales: Arafa, Ahmed, El-Fishawy, Nawal, Badawy, Mohammed, Radad, Marwa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887895/
https://www.ncbi.nlm.nih.gov/pubmed/36717866
http://dx.doi.org/10.1186/s13036-022-00319-3
_version_ 1784880430152941568
author Arafa, Ahmed
El-Fishawy, Nawal
Badawy, Mohammed
Radad, Marwa
author_facet Arafa, Ahmed
El-Fishawy, Nawal
Badawy, Mohammed
Radad, Marwa
author_sort Arafa, Ahmed
collection PubMed
description BACKGROUND: In the current genomic era, gene expression datasets have become one of the main tools utilized in cancer classification. Both curse of dimensionality and class imbalance problems are inherent characteristics of these datasets. These characteristics have a negative impact on the performance of most classifiers when used to classify cancer using genomic datasets. RESULTS: This paper introduces Reduced Noise-Autoencoder (RN-Autoencoder) for pre-processing imbalanced genomic datasets for precise cancer classification. Firstly, RN-Autoencoder solves the curse of dimensionality problem by utilizing the autoencoder for feature reduction and hence generating new extracted data with lower dimensionality. In the next stage, RN-Autoencoder introduces the extracted data to the well-known Reduced Noise-Synthesis Minority Over Sampling Technique (RN- SMOTE) that efficiently solve the problem of class imbalance in the extracted data. RN-Autoencoder has been evaluated using different classifiers and various imbalanced datasets with different imbalance ratios. The results proved that the performance of the classifiers has been improved with RN-Autoencoder and outperformed the performance with original data and extracted data with percentages based on the classifier, dataset and evaluation metric. Also, the performance of RN-Autoencoder has been compared to the performance of the current state of the art and resulted in an increase up to 18.017, 19.183, 18.58 and 8.87% in terms of test accuracy using colon, leukemia, Diffuse Large B-Cell Lymphoma (DLBCL) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. CONCLUSION: RN-Autoencoder is a model for cancer classification using imbalanced gene expression datasets. It utilizes the autoencoder to reduce the high dimensionality of the gene expression datasets and then handles the class imbalance using RN-SMOTE. RN-Autoencoder has been evaluated using many different classifiers and many different imbalanced datasets. The performance of many classifiers has improved and some have succeeded in classifying cancer with 100% performance in terms of all used metrics. In addition, RN-Autoencoder outperformed many recent works using the same datasets.
format Online
Article
Text
id pubmed-9887895
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-98878952023-02-01 RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data Arafa, Ahmed El-Fishawy, Nawal Badawy, Mohammed Radad, Marwa J Biol Eng Research BACKGROUND: In the current genomic era, gene expression datasets have become one of the main tools utilized in cancer classification. Both curse of dimensionality and class imbalance problems are inherent characteristics of these datasets. These characteristics have a negative impact on the performance of most classifiers when used to classify cancer using genomic datasets. RESULTS: This paper introduces Reduced Noise-Autoencoder (RN-Autoencoder) for pre-processing imbalanced genomic datasets for precise cancer classification. Firstly, RN-Autoencoder solves the curse of dimensionality problem by utilizing the autoencoder for feature reduction and hence generating new extracted data with lower dimensionality. In the next stage, RN-Autoencoder introduces the extracted data to the well-known Reduced Noise-Synthesis Minority Over Sampling Technique (RN- SMOTE) that efficiently solve the problem of class imbalance in the extracted data. RN-Autoencoder has been evaluated using different classifiers and various imbalanced datasets with different imbalance ratios. The results proved that the performance of the classifiers has been improved with RN-Autoencoder and outperformed the performance with original data and extracted data with percentages based on the classifier, dataset and evaluation metric. Also, the performance of RN-Autoencoder has been compared to the performance of the current state of the art and resulted in an increase up to 18.017, 19.183, 18.58 and 8.87% in terms of test accuracy using colon, leukemia, Diffuse Large B-Cell Lymphoma (DLBCL) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. CONCLUSION: RN-Autoencoder is a model for cancer classification using imbalanced gene expression datasets. It utilizes the autoencoder to reduce the high dimensionality of the gene expression datasets and then handles the class imbalance using RN-SMOTE. RN-Autoencoder has been evaluated using many different classifiers and many different imbalanced datasets. The performance of many classifiers has improved and some have succeeded in classifying cancer with 100% performance in terms of all used metrics. In addition, RN-Autoencoder outperformed many recent works using the same datasets. BioMed Central 2023-01-30 /pmc/articles/PMC9887895/ /pubmed/36717866 http://dx.doi.org/10.1186/s13036-022-00319-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Arafa, Ahmed
El-Fishawy, Nawal
Badawy, Mohammed
Radad, Marwa
RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title_full RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title_fullStr RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title_full_unstemmed RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title_short RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data
title_sort rn-autoencoder: reduced noise autoencoder for classifying imbalanced cancer genomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887895/
https://www.ncbi.nlm.nih.gov/pubmed/36717866
http://dx.doi.org/10.1186/s13036-022-00319-3
work_keys_str_mv AT arafaahmed rnautoencoderreducednoiseautoencoderforclassifyingimbalancedcancergenomicdata
AT elfishawynawal rnautoencoderreducednoiseautoencoderforclassifyingimbalancedcancergenomicdata
AT badawymohammed rnautoencoderreducednoiseautoencoderforclassifyingimbalancedcancergenomicdata
AT radadmarwa rnautoencoderreducednoiseautoencoderforclassifyingimbalancedcancergenomicdata