Cargando…

A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset

Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invas...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Rajab, Murad, Lu, Joan, Xu, Qiang, Kentour, Mohamed, Sawsa, Ahlam, Shuweikeh, Emad, Joy, Mike, Arasaradnam, Ramesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10621932/
https://www.ncbi.nlm.nih.gov/pubmed/37917732
http://dx.doi.org/10.1371/journal.pone.0286791
_version_ 1785130459256061952
author Al-Rajab, Murad
Lu, Joan
Xu, Qiang
Kentour, Mohamed
Sawsa, Ahlam
Shuweikeh, Emad
Joy, Mike
Arasaradnam, Ramesh
author_facet Al-Rajab, Murad
Lu, Joan
Xu, Qiang
Kentour, Mohamed
Sawsa, Ahlam
Shuweikeh, Emad
Joy, Mike
Arasaradnam, Ramesh
author_sort Al-Rajab, Murad
collection PubMed
description Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
format Online
Article
Text
id pubmed-10621932
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-106219322023-11-03 A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset Al-Rajab, Murad Lu, Joan Xu, Qiang Kentour, Mohamed Sawsa, Ahlam Shuweikeh, Emad Joy, Mike Arasaradnam, Ramesh PLoS One Research Article Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance. Public Library of Science 2023-11-02 /pmc/articles/PMC10621932/ /pubmed/37917732 http://dx.doi.org/10.1371/journal.pone.0286791 Text en © 2023 Al-Rajab et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Al-Rajab, Murad
Lu, Joan
Xu, Qiang
Kentour, Mohamed
Sawsa, Ahlam
Shuweikeh, Emad
Joy, Mike
Arasaradnam, Ramesh
A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title_full A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title_fullStr A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title_full_unstemmed A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title_short A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset
title_sort hybrid machine learning feature selection model—hmlfsm to enhance gene classification applied to multiple colon cancers dataset
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10621932/
https://www.ncbi.nlm.nih.gov/pubmed/37917732
http://dx.doi.org/10.1371/journal.pone.0286791
work_keys_str_mv AT alrajabmurad ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT lujoan ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT xuqiang ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT kentourmohamed ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT sawsaahlam ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT shuweikehemad ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT joymike ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT arasaradnamramesh ahybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT alrajabmurad hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT lujoan hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT xuqiang hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT kentourmohamed hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT sawsaahlam hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT shuweikehemad hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT joymike hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset
AT arasaradnamramesh hybridmachinelearningfeatureselectionmodelhmlfsmtoenhancegeneclassificationappliedtomultiplecoloncancersdataset