Cargando…

Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze...

Descripción completa

Detalles Bibliográficos
Autores principales: Valverde-Albacete, Francisco J., Peláez-Moreno, Carmen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7844629/
https://www.ncbi.nlm.nih.gov/pubmed/33265588
http://dx.doi.org/10.3390/e20070498
_version_ 1783644387491708928
author Valverde-Albacete, Francisco J.
Peláez-Moreno, Carmen
author_facet Valverde-Albacete, Francisco J.
Peláez-Moreno, Carmen
author_sort Valverde-Albacete, Francisco J.
collection PubMed
description Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information [Formula: see text] into a discrete, multivariate sink of information [Formula: see text] related by a distribution [Formula: see text]. The first contribution is a decomposition of the maximal potential entropy of [Formula: see text] , which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of [Formula: see text] and [Formula: see text] , respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.
format Online
Article
Text
id pubmed-7844629
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-78446292021-02-24 Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle Valverde-Albacete, Francisco J. Peláez-Moreno, Carmen Entropy (Basel) Article Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information [Formula: see text] into a discrete, multivariate sink of information [Formula: see text] related by a distribution [Formula: see text]. The first contribution is a decomposition of the maximal potential entropy of [Formula: see text] , which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of [Formula: see text] and [Formula: see text] , respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks. MDPI 2018-06-27 /pmc/articles/PMC7844629/ /pubmed/33265588 http://dx.doi.org/10.3390/e20070498 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Valverde-Albacete, Francisco J.
Peláez-Moreno, Carmen
Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_full Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_fullStr Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_full_unstemmed Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_short Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle
title_sort assessing information transmission in data transformations with the channel multivariate entropy triangle
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7844629/
https://www.ncbi.nlm.nih.gov/pubmed/33265588
http://dx.doi.org/10.3390/e20070498
work_keys_str_mv AT valverdealbacetefranciscoj assessinginformationtransmissionindatatransformationswiththechannelmultivariateentropytriangle
AT pelaezmorenocarmen assessinginformationtransmissionindatatransformationswiththechannelmultivariateentropytriangle