Cargando…

Reduced Data Sets and Entropy-Based Discretization

Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized...

Descripción completa

Detalles Bibliográficos
Autores principales: Grzymala-Busse, Jerzy W., Hippe, Zdzislaw S., Mroczek, Teresa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514355/
http://dx.doi.org/10.3390/e21111051
_version_ 1783586568986951680
author Grzymala-Busse, Jerzy W.
Hippe, Zdzislaw S.
Mroczek, Teresa
author_facet Grzymala-Busse, Jerzy W.
Hippe, Zdzislaw S.
Mroczek, Teresa
author_sort Grzymala-Busse, Jerzy W.
collection PubMed
description Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets.
format Online
Article
Text
id pubmed-7514355
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75143552020-11-09 Reduced Data Sets and Entropy-Based Discretization Grzymala-Busse, Jerzy W. Hippe, Zdzislaw S. Mroczek, Teresa Entropy (Basel) Article Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets. MDPI 2019-10-28 /pmc/articles/PMC7514355/ http://dx.doi.org/10.3390/e21111051 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Grzymala-Busse, Jerzy W.
Hippe, Zdzislaw S.
Mroczek, Teresa
Reduced Data Sets and Entropy-Based Discretization
title Reduced Data Sets and Entropy-Based Discretization
title_full Reduced Data Sets and Entropy-Based Discretization
title_fullStr Reduced Data Sets and Entropy-Based Discretization
title_full_unstemmed Reduced Data Sets and Entropy-Based Discretization
title_short Reduced Data Sets and Entropy-Based Discretization
title_sort reduced data sets and entropy-based discretization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514355/
http://dx.doi.org/10.3390/e21111051
work_keys_str_mv AT grzymalabussejerzyw reduceddatasetsandentropybaseddiscretization
AT hippezdzislaws reduceddatasetsandentropybaseddiscretization
AT mroczekteresa reduceddatasetsandentropybaseddiscretization