Cargando…

Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples

Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Sinc...

Descripción completa

Detalles Bibliográficos
Autores principales: Hernández, Damián G., Samengo, Inés
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515115/
https://www.ncbi.nlm.nih.gov/pubmed/33267337
http://dx.doi.org/10.3390/e21060623
_version_ 1783586744596168704
author Hernández, Damián G.
Samengo, Inés
author_facet Hernández, Damián G.
Samengo, Inés
author_sort Hernández, Damián G.
collection PubMed
description Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables—the one with minimal entropy—is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.
format Online
Article
Text
id pubmed-7515115
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75151152020-11-09 Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples Hernández, Damián G. Samengo, Inés Entropy (Basel) Article Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables—the one with minimal entropy—is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences. MDPI 2019-06-25 /pmc/articles/PMC7515115/ /pubmed/33267337 http://dx.doi.org/10.3390/e21060623 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Hernández, Damián G.
Samengo, Inés
Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title_full Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title_fullStr Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title_full_unstemmed Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title_short Estimating the Mutual Information between Two Discrete, Asymmetric Variables with Limited Samples
title_sort estimating the mutual information between two discrete, asymmetric variables with limited samples
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515115/
https://www.ncbi.nlm.nih.gov/pubmed/33267337
http://dx.doi.org/10.3390/e21060623
work_keys_str_mv AT hernandezdamiang estimatingthemutualinformationbetweentwodiscreteasymmetricvariableswithlimitedsamples
AT samengoines estimatingthemutualinformationbetweentwodiscreteasymmetricvariableswithlimitedsamples