Cargando…

Multivariate Pointwise Information-Driven Data Sampling and Visualization

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while...

Descripción completa

Detalles Bibliográficos
Autores principales: Dutta, Soumya, Biswas, Ayan, Ahrens, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515213/
https://www.ncbi.nlm.nih.gov/pubmed/33267413
http://dx.doi.org/10.3390/e21070699
_version_ 1783586767643869184
author Dutta, Soumya
Biswas, Ayan
Ahrens, James
author_facet Dutta, Soumya
Biswas, Ayan
Ahrens, James
author_sort Dutta, Soumya
collection PubMed
description With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.
format Online
Article
Text
id pubmed-7515213
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75152132020-11-09 Multivariate Pointwise Information-Driven Data Sampling and Visualization Dutta, Soumya Biswas, Ayan Ahrens, James Entropy (Basel) Article With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets. MDPI 2019-07-16 /pmc/articles/PMC7515213/ /pubmed/33267413 http://dx.doi.org/10.3390/e21070699 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dutta, Soumya
Biswas, Ayan
Ahrens, James
Multivariate Pointwise Information-Driven Data Sampling and Visualization
title Multivariate Pointwise Information-Driven Data Sampling and Visualization
title_full Multivariate Pointwise Information-Driven Data Sampling and Visualization
title_fullStr Multivariate Pointwise Information-Driven Data Sampling and Visualization
title_full_unstemmed Multivariate Pointwise Information-Driven Data Sampling and Visualization
title_short Multivariate Pointwise Information-Driven Data Sampling and Visualization
title_sort multivariate pointwise information-driven data sampling and visualization
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515213/
https://www.ncbi.nlm.nih.gov/pubmed/33267413
http://dx.doi.org/10.3390/e21070699
work_keys_str_mv AT duttasoumya multivariatepointwiseinformationdrivendatasamplingandvisualization
AT biswasayan multivariatepointwiseinformationdrivendatasamplingandvisualization
AT ahrensjames multivariatepointwiseinformationdrivendatasamplingandvisualization