Cargando…

Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis

We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural de...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hsieh, Fushing, Chou, Elizabeth P., Chen, Ting-Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8151017/ https://www.ncbi.nlm.nih.gov/pubmed/34064857 http://dx.doi.org/10.3390/e23050594

_version_	1783698284362072064
author	Hsieh, Fushing Chou, Elizabeth P. Chen, Ting-Li
author_facet	Hsieh, Fushing Chou, Elizabeth P. Chen, Ting-Li
author_sort	Hsieh, Fushing
collection	PubMed
description	We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features’ categorical nature via histogram and it is guided by all features’ associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of [Formula: see text] features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix’s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.
format	Online Article Text
id	pubmed-8151017
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-81510172021-05-27 Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis Hsieh, Fushing Chou, Elizabeth P. Chen, Ting-Li Entropy (Basel) Article We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features’ categorical nature via histogram and it is guided by all features’ associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of [Formula: see text] features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix’s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system. MDPI 2021-05-11 /pmc/articles/PMC8151017/ /pubmed/34064857 http://dx.doi.org/10.3390/e23050594 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Hsieh, Fushing Chou, Elizabeth P. Chen, Ting-Li Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title_full	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title_fullStr	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title_full_unstemmed	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title_short	Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis
title_sort	mimicking complexity of structured data matrix’s information content: categorical exploratory data analysis
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8151017/ https://www.ncbi.nlm.nih.gov/pubmed/34064857 http://dx.doi.org/10.3390/e23050594
work_keys_str_mv	AT hsiehfushing mimickingcomplexityofstructureddatamatrixsinformationcontentcategoricalexploratorydataanalysis AT chouelizabethp mimickingcomplexityofstructureddatamatrixsinformationcontentcategoricalexploratorydataanalysis AT chentingli mimickingcomplexityofstructureddatamatrixsinformationcontentcategoricalexploratorydataanalysis

Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis

Ejemplares similares