Cargando…

Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning

Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed s...

Descripción completa

Detalles Bibliográficos
Autores principales: Seok, Junhee, Seon Kang, Yeong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456943/
https://www.ncbi.nlm.nih.gov/pubmed/26046461
http://dx.doi.org/10.1038/srep10981
_version_ 1782374911478595584
author Seok, Junhee
Seon Kang, Yeong
author_facet Seok, Junhee
Seon Kang, Yeong
author_sort Seok, Junhee
collection PubMed
description Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables.
format Online
Article
Text
id pubmed-4456943
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-44569432015-06-12 Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning Seok, Junhee Seon Kang, Yeong Sci Rep Article Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables. Nature Publishing Group 2015-06-05 /pmc/articles/PMC4456943/ /pubmed/26046461 http://dx.doi.org/10.1038/srep10981 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Seok, Junhee
Seon Kang, Yeong
Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title_full Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title_fullStr Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title_full_unstemmed Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title_short Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
title_sort mutual information between discrete variables with many categories using recursive adaptive partitioning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456943/
https://www.ncbi.nlm.nih.gov/pubmed/26046461
http://dx.doi.org/10.1038/srep10981
work_keys_str_mv AT seokjunhee mutualinformationbetweendiscretevariableswithmanycategoriesusingrecursiveadaptivepartitioning
AT seonkangyeong mutualinformationbetweendiscretevariableswithmanycategoriesusingrecursiveadaptivepartitioning