Cargando…

Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty

Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Gómez-Guerrero, Santiago, Ortiz, Inocencio, Sosa-Cabrera, Gustavo, García-Torres, Miguel, Schaerer, Christian E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8774864/
https://www.ncbi.nlm.nih.gov/pubmed/35052090
http://dx.doi.org/10.3390/e24010064
_version_ 1784636445527375872
author Gómez-Guerrero, Santiago
Ortiz, Inocencio
Sosa-Cabrera, Gustavo
García-Torres, Miguel
Schaerer, Christian E.
author_facet Gómez-Guerrero, Santiago
Ortiz, Inocencio
Sosa-Cabrera, Gustavo
García-Torres, Miguel
Schaerer, Christian E.
author_sort Gómez-Guerrero, Santiago
collection PubMed
description Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models.
format Online
Article
Text
id pubmed-8774864
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87748642022-01-21 Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty Gómez-Guerrero, Santiago Ortiz, Inocencio Sosa-Cabrera, Gustavo García-Torres, Miguel Schaerer, Christian E. Entropy (Basel) Article Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models. MDPI 2021-12-30 /pmc/articles/PMC8774864/ /pubmed/35052090 http://dx.doi.org/10.3390/e24010064 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gómez-Guerrero, Santiago
Ortiz, Inocencio
Sosa-Cabrera, Gustavo
García-Torres, Miguel
Schaerer, Christian E.
Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title_full Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title_fullStr Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title_full_unstemmed Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title_short Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty
title_sort measuring interactions in categorical datasets using multivariate symmetrical uncertainty
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8774864/
https://www.ncbi.nlm.nih.gov/pubmed/35052090
http://dx.doi.org/10.3390/e24010064
work_keys_str_mv AT gomezguerrerosantiago measuringinteractionsincategoricaldatasetsusingmultivariatesymmetricaluncertainty
AT ortizinocencio measuringinteractionsincategoricaldatasetsusingmultivariatesymmetricaluncertainty
AT sosacabreragustavo measuringinteractionsincategoricaldatasetsusingmultivariatesymmetricaluncertainty
AT garciatorresmiguel measuringinteractionsincategoricaldatasetsusingmultivariatesymmetricaluncertainty
AT schaererchristiane measuringinteractionsincategoricaldatasetsusingmultivariatesymmetricaluncertainty