Cargando…

Dynamic Mixed Data Analysis and Visualization

One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and...

Descripción completa

Detalles Bibliográficos
Autores principales: Grané, Aurea, Manzi, Giancarlo, Salini, Silvia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601934/
https://www.ncbi.nlm.nih.gov/pubmed/37420419
http://dx.doi.org/10.3390/e24101399
Descripción
Sumario:One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time [Formula: see text] , we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices [Formula: see text]. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on [Formula: see text]; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States.