Cargando…

Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees

In this study, we present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi mutual information by minimum spanning trees (MSTs). The extent to which random variables are dependent is an important question, e.g., for uncertainty quantification and sen...

Descripción completa

Detalles Bibliográficos
Autores principales: Eggels, Anne, Crommelin, Daan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514583/
https://www.ncbi.nlm.nih.gov/pubmed/33266816
http://dx.doi.org/10.3390/e21020100
_version_ 1783586621461889024
author Eggels, Anne
Crommelin, Daan
author_facet Eggels, Anne
Crommelin, Daan
author_sort Eggels, Anne
collection PubMed
description In this study, we present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi mutual information by minimum spanning trees (MSTs). The extent to which random variables are dependent is an important question, e.g., for uncertainty quantification and sensitivity analysis. The latter is closely related to the question how strongly dependent the output of, e.g., a computer simulation, is on the individual random input variables. To estimate the Rényi mutual information from data, we use a method due to Hero et al. that relies on computing minimum spanning trees (MSTs) of the data and uses the length of the MST in an estimator for the entropy. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. Because the MST computation does not require knowledge (or estimation) of the distributions, our methodology is well-suited for situations where only data are available. Furthermore, we show that, in the case where only the ranking of several dependencies is required rather than their exact value, it is not necessary to compute the Rényi divergence, but only an estimator derived from it. The main contributions of this paper are the introduction of this quantifier of dependency, as well as the novel combination of using approximate methods for MSTs with estimating the Rényi mutual information via MSTs. We applied our proposed method to an artificial test case based on the Ishigami function, as well as to a real-world test case involving an El Nino dataset.
format Online
Article
Text
id pubmed-7514583
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75145832020-11-09 Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees Eggels, Anne Crommelin, Daan Entropy (Basel) Article In this study, we present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi mutual information by minimum spanning trees (MSTs). The extent to which random variables are dependent is an important question, e.g., for uncertainty quantification and sensitivity analysis. The latter is closely related to the question how strongly dependent the output of, e.g., a computer simulation, is on the individual random input variables. To estimate the Rényi mutual information from data, we use a method due to Hero et al. that relies on computing minimum spanning trees (MSTs) of the data and uses the length of the MST in an estimator for the entropy. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. Because the MST computation does not require knowledge (or estimation) of the distributions, our methodology is well-suited for situations where only data are available. Furthermore, we show that, in the case where only the ranking of several dependencies is required rather than their exact value, it is not necessary to compute the Rényi divergence, but only an estimator derived from it. The main contributions of this paper are the introduction of this quantifier of dependency, as well as the novel combination of using approximate methods for MSTs with estimating the Rényi mutual information via MSTs. We applied our proposed method to an artificial test case based on the Ishigami function, as well as to a real-world test case involving an El Nino dataset. MDPI 2019-01-22 /pmc/articles/PMC7514583/ /pubmed/33266816 http://dx.doi.org/10.3390/e21020100 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Eggels, Anne
Crommelin, Daan
Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title_full Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title_fullStr Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title_full_unstemmed Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title_short Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees
title_sort quantifying data dependencies with rényi mutual information and minimum spanning trees
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514583/
https://www.ncbi.nlm.nih.gov/pubmed/33266816
http://dx.doi.org/10.3390/e21020100
work_keys_str_mv AT eggelsanne quantifyingdatadependencieswithrenyimutualinformationandminimumspanningtrees
AT crommelindaan quantifyingdatadependencieswithrenyimutualinformationandminimumspanningtrees