Cargando…

Feature-Based Complexity Measure for Multinomial Classification Datasets

Machine learning algorithms are frequently used for classification problems on tabular datasets. In order to make informed decisions about model selection and design, it is crucial to gain meaningful insights into the complexity of these datasets. Feature-based complexity measures are a set of compl...

Descripción completa

Detalles Bibliográficos
Autores principales: Erwin, Kyle, Engelbrecht, Andries
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378522/
https://www.ncbi.nlm.nih.gov/pubmed/37509947
http://dx.doi.org/10.3390/e25071000
_version_ 1785079787865243648
author Erwin, Kyle
Engelbrecht, Andries
author_facet Erwin, Kyle
Engelbrecht, Andries
author_sort Erwin, Kyle
collection PubMed
description Machine learning algorithms are frequently used for classification problems on tabular datasets. In order to make informed decisions about model selection and design, it is crucial to gain meaningful insights into the complexity of these datasets. Feature-based complexity measures are a set of complexity measures that evaluates how useful features are at discriminating instances of different classes. This paper, however, shows that existing feature-based measures are inadequate in accurately measuring the complexity of various synthetic classification datasets, particularly those with multiple classes. This paper proposes a new feature-based complexity measure called the F5 measure, which evaluates the discriminative power of features for each class by identifying long sequences of uninterrupted instances of the same class. It is shown that the F5 measure better represents the feature complexity of a dataset.
format Online
Article
Text
id pubmed-10378522
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103785222023-07-29 Feature-Based Complexity Measure for Multinomial Classification Datasets Erwin, Kyle Engelbrecht, Andries Entropy (Basel) Article Machine learning algorithms are frequently used for classification problems on tabular datasets. In order to make informed decisions about model selection and design, it is crucial to gain meaningful insights into the complexity of these datasets. Feature-based complexity measures are a set of complexity measures that evaluates how useful features are at discriminating instances of different classes. This paper, however, shows that existing feature-based measures are inadequate in accurately measuring the complexity of various synthetic classification datasets, particularly those with multiple classes. This paper proposes a new feature-based complexity measure called the F5 measure, which evaluates the discriminative power of features for each class by identifying long sequences of uninterrupted instances of the same class. It is shown that the F5 measure better represents the feature complexity of a dataset. MDPI 2023-06-29 /pmc/articles/PMC10378522/ /pubmed/37509947 http://dx.doi.org/10.3390/e25071000 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Erwin, Kyle
Engelbrecht, Andries
Feature-Based Complexity Measure for Multinomial Classification Datasets
title Feature-Based Complexity Measure for Multinomial Classification Datasets
title_full Feature-Based Complexity Measure for Multinomial Classification Datasets
title_fullStr Feature-Based Complexity Measure for Multinomial Classification Datasets
title_full_unstemmed Feature-Based Complexity Measure for Multinomial Classification Datasets
title_short Feature-Based Complexity Measure for Multinomial Classification Datasets
title_sort feature-based complexity measure for multinomial classification datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378522/
https://www.ncbi.nlm.nih.gov/pubmed/37509947
http://dx.doi.org/10.3390/e25071000
work_keys_str_mv AT erwinkyle featurebasedcomplexitymeasureformultinomialclassificationdatasets
AT engelbrechtandries featurebasedcomplexitymeasureformultinomialclassificationdatasets