Cargando…

A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities

Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks....

Descripción completa

Detalles Bibliográficos
Autores principales: Abiodun, Esther Omolara, Alabdulatif, Abdulatif, Abiodun, Oludare Isaac, Alawida, Moatsum, Alabdulatif, Abdullah, Alkhawaldeh, Rami S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer London 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361413/
https://www.ncbi.nlm.nih.gov/pubmed/34404964
http://dx.doi.org/10.1007/s00521-021-06406-8
_version_ 1783737950011392000
author Abiodun, Esther Omolara
Alabdulatif, Abdulatif
Abiodun, Oludare Isaac
Alawida, Moatsum
Alabdulatif, Abdullah
Alkhawaldeh, Rami S.
author_facet Abiodun, Esther Omolara
Alabdulatif, Abdulatif
Abiodun, Oludare Isaac
Alawida, Moatsum
Alabdulatif, Abdullah
Alkhawaldeh, Rami S.
author_sort Abiodun, Esther Omolara
collection PubMed
description Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks.
format Online
Article
Text
id pubmed-8361413
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer London
record_format MEDLINE/PubMed
spelling pubmed-83614132021-08-13 A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities Abiodun, Esther Omolara Alabdulatif, Abdulatif Abiodun, Oludare Isaac Alawida, Moatsum Alabdulatif, Abdullah Alkhawaldeh, Rami S. Neural Comput Appl Review Article Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks. Springer London 2021-08-13 2021 /pmc/articles/PMC8361413/ /pubmed/34404964 http://dx.doi.org/10.1007/s00521-021-06406-8 Text en © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Review Article
Abiodun, Esther Omolara
Alabdulatif, Abdulatif
Abiodun, Oludare Isaac
Alawida, Moatsum
Alabdulatif, Abdullah
Alkhawaldeh, Rami S.
A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title_full A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title_fullStr A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title_full_unstemmed A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title_short A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
title_sort systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8361413/
https://www.ncbi.nlm.nih.gov/pubmed/34404964
http://dx.doi.org/10.1007/s00521-021-06406-8
work_keys_str_mv AT abiodunestheromolara asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alabdulatifabdulatif asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT abiodunoludareisaac asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alawidamoatsum asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alabdulatifabdullah asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alkhawaldehramis asystematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT abiodunestheromolara systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alabdulatifabdulatif systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT abiodunoludareisaac systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alawidamoatsum systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alabdulatifabdullah systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities
AT alkhawaldehramis systematicreviewofemergingfeatureselectionoptimizationmethodsforoptimaltextclassificationthepresentstateandprospectiveopportunities