Cargando…
Dynamic Mixed Data Analysis and Visualization
One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601934/ https://www.ncbi.nlm.nih.gov/pubmed/37420419 http://dx.doi.org/10.3390/e24101399 |
_version_ | 1784817187080372224 |
---|---|
author | Grané, Aurea Manzi, Giancarlo Salini, Silvia |
author_facet | Grané, Aurea Manzi, Giancarlo Salini, Silvia |
author_sort | Grané, Aurea |
collection | PubMed |
description | One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time [Formula: see text] , we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices [Formula: see text]. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on [Formula: see text]; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States. |
format | Online Article Text |
id | pubmed-9601934 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96019342022-10-27 Dynamic Mixed Data Analysis and Visualization Grané, Aurea Manzi, Giancarlo Salini, Silvia Entropy (Basel) Article One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time [Formula: see text] , we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices [Formula: see text]. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on [Formula: see text]; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States. MDPI 2022-10-01 /pmc/articles/PMC9601934/ /pubmed/37420419 http://dx.doi.org/10.3390/e24101399 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Grané, Aurea Manzi, Giancarlo Salini, Silvia Dynamic Mixed Data Analysis and Visualization |
title | Dynamic Mixed Data Analysis and Visualization |
title_full | Dynamic Mixed Data Analysis and Visualization |
title_fullStr | Dynamic Mixed Data Analysis and Visualization |
title_full_unstemmed | Dynamic Mixed Data Analysis and Visualization |
title_short | Dynamic Mixed Data Analysis and Visualization |
title_sort | dynamic mixed data analysis and visualization |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601934/ https://www.ncbi.nlm.nih.gov/pubmed/37420419 http://dx.doi.org/10.3390/e24101399 |
work_keys_str_mv | AT graneaurea dynamicmixeddataanalysisandvisualization AT manzigiancarlo dynamicmixeddataanalysisandvisualization AT salinisilvia dynamicmixeddataanalysisandvisualization |