Cargando…
Complexity of possibly gapped histogram and analysis of histogram
We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society Publishing
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5830718/ https://www.ncbi.nlm.nih.gov/pubmed/29515829 http://dx.doi.org/10.1098/rsos.171026 |
_version_ | 1783303046119292928 |
---|---|
author | Fushing, Hsieh Roy, Tania |
author_facet | Fushing, Hsieh Roy, Tania |
author_sort | Fushing, Hsieh |
collection | PubMed |
description | We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT. |
format | Online Article Text |
id | pubmed-5830718 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | The Royal Society Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-58307182018-03-07 Complexity of possibly gapped histogram and analysis of histogram Fushing, Hsieh Roy, Tania R Soc Open Sci Computer Science We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT. The Royal Society Publishing 2018-02-28 /pmc/articles/PMC5830718/ /pubmed/29515829 http://dx.doi.org/10.1098/rsos.171026 Text en © 2018 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Computer Science Fushing, Hsieh Roy, Tania Complexity of possibly gapped histogram and analysis of histogram |
title | Complexity of possibly gapped histogram and analysis of histogram |
title_full | Complexity of possibly gapped histogram and analysis of histogram |
title_fullStr | Complexity of possibly gapped histogram and analysis of histogram |
title_full_unstemmed | Complexity of possibly gapped histogram and analysis of histogram |
title_short | Complexity of possibly gapped histogram and analysis of histogram |
title_sort | complexity of possibly gapped histogram and analysis of histogram |
topic | Computer Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5830718/ https://www.ncbi.nlm.nih.gov/pubmed/29515829 http://dx.doi.org/10.1098/rsos.171026 |
work_keys_str_mv | AT fushinghsieh complexityofpossiblygappedhistogramandanalysisofhistogram AT roytania complexityofpossiblygappedhistogramandanalysisofhistogram |