Cargando…

Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm

Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of va...

Descripción completa

Detalles Bibliográficos
Autores principales: Sakhanenko, Nikita A., Galas, David J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642827/
https://www.ncbi.nlm.nih.gov/pubmed/26335709
http://dx.doi.org/10.1089/cmb.2015.0051
_version_ 1782400428078530560
author Sakhanenko, Nikita A.
Galas, David J.
author_facet Sakhanenko, Nikita A.
Galas, David J.
author_sort Sakhanenko, Nikita A.
collection PubMed
description Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables.  The proposed dependence measure for a subset of variables, τ, differential interaction information, Δ(τ), has the property that for subsets of τ some of the factors of Δ(τ) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the “shadows” of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate “shadows.” The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the “shadows” calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds.
format Online
Article
Text
id pubmed-4642827
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-46428272015-11-20 Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm Sakhanenko, Nikita A. Galas, David J. J Comput Biol Research Articles Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables.  The proposed dependence measure for a subset of variables, τ, differential interaction information, Δ(τ), has the property that for subsets of τ some of the factors of Δ(τ) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the “shadows” of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate “shadows.” The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the “shadows” calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds. Mary Ann Liebert, Inc. 2015-11-01 /pmc/articles/PMC4642827/ /pubmed/26335709 http://dx.doi.org/10.1089/cmb.2015.0051 Text en © The Author(s) 2015; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Research Articles
Sakhanenko, Nikita A.
Galas, David J.
Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title_full Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title_fullStr Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title_full_unstemmed Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title_short Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm
title_sort biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642827/
https://www.ncbi.nlm.nih.gov/pubmed/26335709
http://dx.doi.org/10.1089/cmb.2015.0051
work_keys_str_mv AT sakhanenkonikitaa biologicaldataanalysisasaninformationtheoryproblemmultivariabledependencemeasuresandtheshadowsalgorithm
AT galasdavidj biologicaldataanalysisasaninformationtheoryproblemmultivariabledependencemeasuresandtheshadowsalgorithm