Cargando…

Real-World Data Difficulty Estimation with the Use of Entropy

In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most s...

Descripción completa

Detalles Bibliográficos
Autores principales: Juszczuk, Przemysław, Kozak, Jan, Dziczkowski, Grzegorz, Głowania, Szymon, Jach, Tomasz, Probierz, Barbara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700715/
https://www.ncbi.nlm.nih.gov/pubmed/34945927
http://dx.doi.org/10.3390/e23121621
_version_ 1784620824476516352
author Juszczuk, Przemysław
Kozak, Jan
Dziczkowski, Grzegorz
Głowania, Szymon
Jach, Tomasz
Probierz, Barbara
author_facet Juszczuk, Przemysław
Kozak, Jan
Dziczkowski, Grzegorz
Głowania, Szymon
Jach, Tomasz
Probierz, Barbara
author_sort Juszczuk, Przemysław
collection PubMed
description In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.
format Online
Article
Text
id pubmed-8700715
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87007152021-12-24 Real-World Data Difficulty Estimation with the Use of Entropy Juszczuk, Przemysław Kozak, Jan Dziczkowski, Grzegorz Głowania, Szymon Jach, Tomasz Probierz, Barbara Entropy (Basel) Article In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared. MDPI 2021-12-01 /pmc/articles/PMC8700715/ /pubmed/34945927 http://dx.doi.org/10.3390/e23121621 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Juszczuk, Przemysław
Kozak, Jan
Dziczkowski, Grzegorz
Głowania, Szymon
Jach, Tomasz
Probierz, Barbara
Real-World Data Difficulty Estimation with the Use of Entropy
title Real-World Data Difficulty Estimation with the Use of Entropy
title_full Real-World Data Difficulty Estimation with the Use of Entropy
title_fullStr Real-World Data Difficulty Estimation with the Use of Entropy
title_full_unstemmed Real-World Data Difficulty Estimation with the Use of Entropy
title_short Real-World Data Difficulty Estimation with the Use of Entropy
title_sort real-world data difficulty estimation with the use of entropy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700715/
https://www.ncbi.nlm.nih.gov/pubmed/34945927
http://dx.doi.org/10.3390/e23121621
work_keys_str_mv AT juszczukprzemysław realworlddatadifficultyestimationwiththeuseofentropy
AT kozakjan realworlddatadifficultyestimationwiththeuseofentropy
AT dziczkowskigrzegorz realworlddatadifficultyestimationwiththeuseofentropy
AT głowaniaszymon realworlddatadifficultyestimationwiththeuseofentropy
AT jachtomasz realworlddatadifficultyestimationwiththeuseofentropy
AT probierzbarbara realworlddatadifficultyestimationwiththeuseofentropy