Cargando…

A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis

Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is...

Descripción completa

Detalles Bibliográficos
Autores principales: Elemam, Tarneem, Elshrkawey, Mohamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381276/
https://www.ncbi.nlm.nih.gov/pubmed/35983572
http://dx.doi.org/10.1155/2022/1056490
_version_ 1784769043931070464
author Elemam, Tarneem
Elshrkawey, Mohamed
author_facet Elemam, Tarneem
Elshrkawey, Mohamed
author_sort Elemam, Tarneem
collection PubMed
description Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.
format Online
Article
Text
id pubmed-9381276
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-93812762022-08-17 A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis Elemam, Tarneem Elshrkawey, Mohamed ScientificWorldJournal Research Article Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy. Hindawi 2022-08-09 /pmc/articles/PMC9381276/ /pubmed/35983572 http://dx.doi.org/10.1155/2022/1056490 Text en Copyright © 2022 Tarneem Elemam and Mohamed Elshrkawey. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Elemam, Tarneem
Elshrkawey, Mohamed
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title_full A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title_fullStr A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title_full_unstemmed A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title_short A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis
title_sort highly discriminative hybrid feature selection algorithm for cancer diagnosis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9381276/
https://www.ncbi.nlm.nih.gov/pubmed/35983572
http://dx.doi.org/10.1155/2022/1056490
work_keys_str_mv AT elemamtarneem ahighlydiscriminativehybridfeatureselectionalgorithmforcancerdiagnosis
AT elshrkaweymohamed ahighlydiscriminativehybridfeatureselectionalgorithmforcancerdiagnosis
AT elemamtarneem highlydiscriminativehybridfeatureselectionalgorithmforcancerdiagnosis
AT elshrkaweymohamed highlydiscriminativehybridfeatureselectionalgorithmforcancerdiagnosis