Cargando…
Reduced Data Sets and Entropy-Based Discretization
Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514355/ http://dx.doi.org/10.3390/e21111051 |
_version_ | 1783586568986951680 |
---|---|
author | Grzymala-Busse, Jerzy W. Hippe, Zdzislaw S. Mroczek, Teresa |
author_facet | Grzymala-Busse, Jerzy W. Hippe, Zdzislaw S. Mroczek, Teresa |
author_sort | Grzymala-Busse, Jerzy W. |
collection | PubMed |
description | Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets. |
format | Online Article Text |
id | pubmed-7514355 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75143552020-11-09 Reduced Data Sets and Entropy-Based Discretization Grzymala-Busse, Jerzy W. Hippe, Zdzislaw S. Mroczek, Teresa Entropy (Basel) Article Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets. MDPI 2019-10-28 /pmc/articles/PMC7514355/ http://dx.doi.org/10.3390/e21111051 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Grzymala-Busse, Jerzy W. Hippe, Zdzislaw S. Mroczek, Teresa Reduced Data Sets and Entropy-Based Discretization |
title | Reduced Data Sets and Entropy-Based Discretization |
title_full | Reduced Data Sets and Entropy-Based Discretization |
title_fullStr | Reduced Data Sets and Entropy-Based Discretization |
title_full_unstemmed | Reduced Data Sets and Entropy-Based Discretization |
title_short | Reduced Data Sets and Entropy-Based Discretization |
title_sort | reduced data sets and entropy-based discretization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514355/ http://dx.doi.org/10.3390/e21111051 |
work_keys_str_mv | AT grzymalabussejerzyw reduceddatasetsandentropybaseddiscretization AT hippezdzislaws reduceddatasetsandentropybaseddiscretization AT mroczekteresa reduceddatasetsandentropybaseddiscretization |