Cargando…

An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data

Recent breakthroughs are making a significant contribution to big data in biomedicine which are anticipated to assist in disease diagnosis and patient care management. To obtain relevant information from this data, effective administration and analysis are required. One of the major challenges assoc...

Descripción completa

Detalles Bibliográficos
Autor principal: Pashaei, Elnaz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10604175/
https://www.ncbi.nlm.nih.gov/pubmed/37892853
http://dx.doi.org/10.3390/bioengineering10101123
_version_ 1785126773657174016
author Pashaei, Elnaz
author_facet Pashaei, Elnaz
author_sort Pashaei, Elnaz
collection PubMed
description Recent breakthroughs are making a significant contribution to big data in biomedicine which are anticipated to assist in disease diagnosis and patient care management. To obtain relevant information from this data, effective administration and analysis are required. One of the major challenges associated with biomedical data analysis is the so-called “curse of dimensionality”. For this issue, a new version of Binary Sand Cat Swarm Optimization (called PILC-BSCSO), incorporating a pinhole-imaging-based learning strategy and crossover operator, is presented for selecting the most informative features. First, the crossover operator is used to strengthen the search capability of BSCSO. Second, the pinhole-imaging learning strategy is utilized to effectively increase exploration capacity while avoiding premature convergence. The Support Vector Machine (SVM) classifier with a linear kernel is used to assess classification accuracy. The experimental results show that the PILC-BSCSO algorithm beats 11 cutting-edge techniques in terms of classification accuracy and the number of selected features using three public medical datasets. Moreover, PILC-BSCSO achieves a classification accuracy of 100% for colon cancer, which is difficult to classify accurately, based on just 10 genes. A real Liver Hepatocellular Carcinoma (TCGA-HCC) data set was also used to further evaluate the effectiveness of the PILC-BSCSO approach. PILC-BSCSO identifies a subset of five marker genes, including prognostic biomarkers HMMR, CHST4, and COL15A1, that have excellent predictive potential for liver cancer using TCGA data.
format Online
Article
Text
id pubmed-10604175
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106041752023-10-28 An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data Pashaei, Elnaz Bioengineering (Basel) Article Recent breakthroughs are making a significant contribution to big data in biomedicine which are anticipated to assist in disease diagnosis and patient care management. To obtain relevant information from this data, effective administration and analysis are required. One of the major challenges associated with biomedical data analysis is the so-called “curse of dimensionality”. For this issue, a new version of Binary Sand Cat Swarm Optimization (called PILC-BSCSO), incorporating a pinhole-imaging-based learning strategy and crossover operator, is presented for selecting the most informative features. First, the crossover operator is used to strengthen the search capability of BSCSO. Second, the pinhole-imaging learning strategy is utilized to effectively increase exploration capacity while avoiding premature convergence. The Support Vector Machine (SVM) classifier with a linear kernel is used to assess classification accuracy. The experimental results show that the PILC-BSCSO algorithm beats 11 cutting-edge techniques in terms of classification accuracy and the number of selected features using three public medical datasets. Moreover, PILC-BSCSO achieves a classification accuracy of 100% for colon cancer, which is difficult to classify accurately, based on just 10 genes. A real Liver Hepatocellular Carcinoma (TCGA-HCC) data set was also used to further evaluate the effectiveness of the PILC-BSCSO approach. PILC-BSCSO identifies a subset of five marker genes, including prognostic biomarkers HMMR, CHST4, and COL15A1, that have excellent predictive potential for liver cancer using TCGA data. MDPI 2023-09-25 /pmc/articles/PMC10604175/ /pubmed/37892853 http://dx.doi.org/10.3390/bioengineering10101123 Text en © 2023 by the author. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pashaei, Elnaz
An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title_full An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title_fullStr An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title_full_unstemmed An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title_short An Efficient Binary Sand Cat Swarm Optimization for Feature Selection in High-Dimensional Biomedical Data
title_sort efficient binary sand cat swarm optimization for feature selection in high-dimensional biomedical data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10604175/
https://www.ncbi.nlm.nih.gov/pubmed/37892853
http://dx.doi.org/10.3390/bioengineering10101123
work_keys_str_mv AT pashaeielnaz anefficientbinarysandcatswarmoptimizationforfeatureselectioninhighdimensionalbiomedicaldata
AT pashaeielnaz efficientbinarysandcatswarmoptimizationforfeatureselectioninhighdimensionalbiomedicaldata