Cargando…

A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification

In today’s data-driven digital culture, there is a critical demand for optimized solutions that essentially reduce operating expenses while attempting to increase productivity. The amount of memory and processing time that can be used to process enormous volumes of data are subject to a number of li...

Descripción completa

Detalles Bibliográficos
Autores principales: Jia, LiYun, Wang, Tao, Gad, Ahmed G., Salem, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462760/
https://www.ncbi.nlm.nih.gov/pubmed/37640716
http://dx.doi.org/10.1038/s41598-023-38252-0
_version_ 1785098102679535616
author Jia, LiYun
Wang, Tao
Gad, Ahmed G.
Salem, Ahmed
author_facet Jia, LiYun
Wang, Tao
Gad, Ahmed G.
Salem, Ahmed
author_sort Jia, LiYun
collection PubMed
description In today’s data-driven digital culture, there is a critical demand for optimized solutions that essentially reduce operating expenses while attempting to increase productivity. The amount of memory and processing time that can be used to process enormous volumes of data are subject to a number of limitations. This would undoubtedly be more of a problem if a dataset contained redundant and uninteresting information. For instance, many datasets contain a number of non-informative features that primarily deceive a given classification algorithm. In order to tackle this, researchers have been developing a variety of feature selection (FS) techniques that aim to eliminate unnecessary information from the raw datasets before putting them in front of a machine learning (ML) algorithm. Meta-heuristic optimization algorithms are often a solid choice to solve NP-hard problems like FS. In this study, we present a wrapper FS technique based on the sparrow search algorithm (SSA), a type of meta-heuristic. SSA is a swarm intelligence (SI) method that stands out because of its quick convergence and improved stability. SSA does have some drawbacks, like lower swarm diversity and weak exploration ability in late iterations, like the majority of SI algorithms. So, using ten chaotic maps, we try to ameliorate SSA in three ways: (i) the initial swarm generation; (ii) the substitution of two random variables in SSA; and (iii) clamping the sparrows crossing the search range. As a result, we get CSSA, a chaotic form of SSA. Extensive comparisons show CSSA to be superior in terms of swarm diversity and convergence speed in solving various representative functions from the Institute of Electrical and Electronics Engineers (IEEE) Congress on Evolutionary Computation (CEC) benchmark set. Furthermore, experimental analysis of CSSA on eighteen interdisciplinary, multi-scale ML datasets from the University of California Irvine (UCI) data repository, as well as three high-dimensional microarray datasets, demonstrates that CSSA outperforms twelve state-of-the-art algorithms in a classification task based on FS discipline. Finally, a 5%-significance-level statistical post-hoc analysis based on Wilcoxon’s signed-rank test, Friedman’s rank test, and Nemenyi’s test confirms CSSA’s significance in terms of overall fitness, classification accuracy, selected feature size, computational time, convergence trace, and stability.
format Online
Article
Text
id pubmed-10462760
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104627602023-08-30 A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification Jia, LiYun Wang, Tao Gad, Ahmed G. Salem, Ahmed Sci Rep Article In today’s data-driven digital culture, there is a critical demand for optimized solutions that essentially reduce operating expenses while attempting to increase productivity. The amount of memory and processing time that can be used to process enormous volumes of data are subject to a number of limitations. This would undoubtedly be more of a problem if a dataset contained redundant and uninteresting information. For instance, many datasets contain a number of non-informative features that primarily deceive a given classification algorithm. In order to tackle this, researchers have been developing a variety of feature selection (FS) techniques that aim to eliminate unnecessary information from the raw datasets before putting them in front of a machine learning (ML) algorithm. Meta-heuristic optimization algorithms are often a solid choice to solve NP-hard problems like FS. In this study, we present a wrapper FS technique based on the sparrow search algorithm (SSA), a type of meta-heuristic. SSA is a swarm intelligence (SI) method that stands out because of its quick convergence and improved stability. SSA does have some drawbacks, like lower swarm diversity and weak exploration ability in late iterations, like the majority of SI algorithms. So, using ten chaotic maps, we try to ameliorate SSA in three ways: (i) the initial swarm generation; (ii) the substitution of two random variables in SSA; and (iii) clamping the sparrows crossing the search range. As a result, we get CSSA, a chaotic form of SSA. Extensive comparisons show CSSA to be superior in terms of swarm diversity and convergence speed in solving various representative functions from the Institute of Electrical and Electronics Engineers (IEEE) Congress on Evolutionary Computation (CEC) benchmark set. Furthermore, experimental analysis of CSSA on eighteen interdisciplinary, multi-scale ML datasets from the University of California Irvine (UCI) data repository, as well as three high-dimensional microarray datasets, demonstrates that CSSA outperforms twelve state-of-the-art algorithms in a classification task based on FS discipline. Finally, a 5%-significance-level statistical post-hoc analysis based on Wilcoxon’s signed-rank test, Friedman’s rank test, and Nemenyi’s test confirms CSSA’s significance in terms of overall fitness, classification accuracy, selected feature size, computational time, convergence trace, and stability. Nature Publishing Group UK 2023-08-28 /pmc/articles/PMC10462760/ /pubmed/37640716 http://dx.doi.org/10.1038/s41598-023-38252-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Jia, LiYun
Wang, Tao
Gad, Ahmed G.
Salem, Ahmed
A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title_full A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title_fullStr A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title_full_unstemmed A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title_short A weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
title_sort weighted-sum chaotic sparrow search algorithm for interdisciplinary feature selection and data classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462760/
https://www.ncbi.nlm.nih.gov/pubmed/37640716
http://dx.doi.org/10.1038/s41598-023-38252-0
work_keys_str_mv AT jialiyun aweightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT wangtao aweightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT gadahmedg aweightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT salemahmed aweightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT jialiyun weightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT wangtao weightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT gadahmedg weightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification
AT salemahmed weightedsumchaoticsparrowsearchalgorithmforinterdisciplinaryfeatureselectionanddataclassification