Cargando…

Complexity of possibly gapped histogram and analysis of histogram

We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis...

Descripción completa

Detalles Bibliográficos
Autores principales: Fushing, Hsieh, Roy, Tania
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5830718/
https://www.ncbi.nlm.nih.gov/pubmed/29515829
http://dx.doi.org/10.1098/rsos.171026
_version_ 1783303046119292928
author Fushing, Hsieh
Roy, Tania
author_facet Fushing, Hsieh
Roy, Tania
author_sort Fushing, Hsieh
collection PubMed
description We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.
format Online
Article
Text
id pubmed-5830718
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher The Royal Society Publishing
record_format MEDLINE/PubMed
spelling pubmed-58307182018-03-07 Complexity of possibly gapped histogram and analysis of histogram Fushing, Hsieh Roy, Tania R Soc Open Sci Computer Science We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT. The Royal Society Publishing 2018-02-28 /pmc/articles/PMC5830718/ /pubmed/29515829 http://dx.doi.org/10.1098/rsos.171026 Text en © 2018 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Computer Science
Fushing, Hsieh
Roy, Tania
Complexity of possibly gapped histogram and analysis of histogram
title Complexity of possibly gapped histogram and analysis of histogram
title_full Complexity of possibly gapped histogram and analysis of histogram
title_fullStr Complexity of possibly gapped histogram and analysis of histogram
title_full_unstemmed Complexity of possibly gapped histogram and analysis of histogram
title_short Complexity of possibly gapped histogram and analysis of histogram
title_sort complexity of possibly gapped histogram and analysis of histogram
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5830718/
https://www.ncbi.nlm.nih.gov/pubmed/29515829
http://dx.doi.org/10.1098/rsos.171026
work_keys_str_mv AT fushinghsieh complexityofpossiblygappedhistogramandanalysisofhistogram
AT roytania complexityofpossiblygappedhistogramandanalysisofhistogram