Cargando…

Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function

Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual inf...

Descripción completa

Detalles Bibliográficos
Autores principales: Uechi, Lisa, Galas, David J., Sakhanenko, Nikita A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6383577/
https://www.ncbi.nlm.nih.gov/pubmed/30495984
http://dx.doi.org/10.1089/cmb.2018.0179
_version_ 1783396863593938944
author Uechi, Lisa
Galas, David J.
Sakhanenko, Nikita A.
author_facet Uechi, Lisa
Galas, David J.
Sakhanenko, Nikita A.
author_sort Uechi, Lisa
collection PubMed
description Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice—in the absence of prior knowledge of random imputation—of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data.
format Online
Article
Text
id pubmed-6383577
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-63835772019-02-22 Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function Uechi, Lisa Galas, David J. Sakhanenko, Nikita A. J Comput Biol Research Articles Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice—in the absence of prior knowledge of random imputation—of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data. Mary Ann Liebert, Inc., publishers 2019-02-01 2019-02-06 /pmc/articles/PMC6383577/ /pubmed/30495984 http://dx.doi.org/10.1089/cmb.2018.0179 Text en © Lisa Uechi, et al., 2018; Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are cited.
spellingShingle Research Articles
Uechi, Lisa
Galas, David J.
Sakhanenko, Nikita A.
Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title_full Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title_fullStr Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title_full_unstemmed Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title_short Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function
title_sort multivariate analysis of data sets with missing values: an information theory-based reliability function
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6383577/
https://www.ncbi.nlm.nih.gov/pubmed/30495984
http://dx.doi.org/10.1089/cmb.2018.0179
work_keys_str_mv AT uechilisa multivariateanalysisofdatasetswithmissingvaluesaninformationtheorybasedreliabilityfunction
AT galasdavidj multivariateanalysisofdatasetswithmissingvaluesaninformationtheorybasedreliabilityfunction
AT sakhanenkonikitaa multivariateanalysisofdatasetswithmissingvaluesaninformationtheorybasedreliabilityfunction