Cargando…
Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning
Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456943/ https://www.ncbi.nlm.nih.gov/pubmed/26046461 http://dx.doi.org/10.1038/srep10981 |
_version_ | 1782374911478595584 |
---|---|
author | Seok, Junhee Seon Kang, Yeong |
author_facet | Seok, Junhee Seon Kang, Yeong |
author_sort | Seok, Junhee |
collection | PubMed |
description | Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables. |
format | Online Article Text |
id | pubmed-4456943 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-44569432015-06-12 Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning Seok, Junhee Seon Kang, Yeong Sci Rep Article Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables. Nature Publishing Group 2015-06-05 /pmc/articles/PMC4456943/ /pubmed/26046461 http://dx.doi.org/10.1038/srep10981 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Seok, Junhee Seon Kang, Yeong Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title | Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title_full | Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title_fullStr | Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title_full_unstemmed | Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title_short | Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning |
title_sort | mutual information between discrete variables with many categories using recursive adaptive partitioning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4456943/ https://www.ncbi.nlm.nih.gov/pubmed/26046461 http://dx.doi.org/10.1038/srep10981 |
work_keys_str_mv | AT seokjunhee mutualinformationbetweendiscretevariableswithmanycategoriesusingrecursiveadaptivepartitioning AT seonkangyeong mutualinformationbetweendiscretevariableswithmanycategoriesusingrecursiveadaptivepartitioning |