Cargando…

Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences

Obtaining insights in the tremendous amount of data in which the Big Data era has brought us, requires to develop specific tools, that are not only summaries of data through classical charts and tables, but that allow full navigation and browsing of a dataset. The proper modeling of databases can en...

Descripción completa

Detalles Bibliográficos
Autor principal: Ouvrard, Xavier Eric
Lenguaje:eng
Publicado: 2020
Materias:
Acceso en línea:http://cds.cern.ch/record/2721216
_version_ 1780965836948766720
author Ouvrard, Xavier Eric
author_facet Ouvrard, Xavier Eric
author_sort Ouvrard, Xavier Eric
collection CERN
description Obtaining insights in the tremendous amount of data in which the Big Data era has brought us, requires to develop specific tools, that are not only summaries of data through classical charts and tables, but that allow full navigation and browsing of a dataset. The proper modeling of databases can enable such navigation and we propose in this Thesis a methodology to achieve the browsing of an information space, through its different facets. To achieve the modeling of such an information space, co-occurrences of data instances are built referring to a common reference type. Historically, the co-occurrences were seen as pairwise relationships and developed as such. The move to hypergraphs enables the possibility to take into account the multi-adicity of the relationships, and to have a representation through the incident graph that simplifies deeply its 2-section. Nonetheless, representing large hypergraphs calls for a coarsening of the information by having insights on important vertices and hyperedges. One classical way to achieve it is to use a diffusion process over the network. Achieving it using an incident matrix is feasible but brings us to a pitfall, as it brings us back to a pairwise relationship. Making proper diffusion requires a tensor approach. This is well known for uniform hypergraphs, where all the hyperedges have same cardinality, but still very challenging for general hypergraphs. After redefining the concept of adjacency in general hypergraphs, we propose a first e-adjacency tensor, that involves a Hypergraph Uniformisation Process and a Polynomial Homogenization Process. This is achieved by uniformisation of the original hypergraph by decomposing it into layers—each of them containing a uniform hypergraph—and filling each layer with additional special vertices and merging them together. This process requires to have as many additional vertices as the number of layers. In order to reduce the number of special vertices, we need to have the possibility of re- peating a vertex when filling, which is not possible with hyperedges as they are sets. We need multisets. It, therefore, requires a new mathematical structure, that we have intro- duced and called hyper-bag-graph—hb-graph for short—, which is a family of multisets of a given universe. Co-occurrences can also have repetitions or individual weighting of their vertices inside a given co-occurrence and hb-graphs fit to handle it. Hence, we introduce a hb-graph framework for co-occurrence networks. We then work on diffusion on such structures, using, in a first step, a matrix approach. Aggregating the ranking of vertices and hb- edges of this diffusion on each of the facet of the information space is achieved by using a multi-diffusion scheme. Since different facets might have different focus of interest, we introduce a biased diffusion that enables a tuning on the point of emphasis on the feature we are interested in. Finally, coming back to e-adjacency tensor, we propose three e-adjacency tensors of hb- graphs, that are based on different ways of filling the hb-edges. The m-uniformisation that is achieved is evaluated and compared to the ones achieved by hb-edge splitting, concluding that any m-uniformisation process has an influence on the exchange-based diffusion that we propose. Hence, we conclude that diffusion using the tensor approach must be done in an informed manner to account for this diffusion change. We finally discuss different possible of achieving it and present a new Laplacian that can help to achieve it.
id cern-2721216
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2020
record_format invenio
spelling cern-27212162020-06-19T20:24:54Zhttp://cds.cern.ch/record/2721216engOuvrard, Xavier EricHyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrencesComputing and ComputersMathematical Physics and MathematicsObtaining insights in the tremendous amount of data in which the Big Data era has brought us, requires to develop specific tools, that are not only summaries of data through classical charts and tables, but that allow full navigation and browsing of a dataset. The proper modeling of databases can enable such navigation and we propose in this Thesis a methodology to achieve the browsing of an information space, through its different facets. To achieve the modeling of such an information space, co-occurrences of data instances are built referring to a common reference type. Historically, the co-occurrences were seen as pairwise relationships and developed as such. The move to hypergraphs enables the possibility to take into account the multi-adicity of the relationships, and to have a representation through the incident graph that simplifies deeply its 2-section. Nonetheless, representing large hypergraphs calls for a coarsening of the information by having insights on important vertices and hyperedges. One classical way to achieve it is to use a diffusion process over the network. Achieving it using an incident matrix is feasible but brings us to a pitfall, as it brings us back to a pairwise relationship. Making proper diffusion requires a tensor approach. This is well known for uniform hypergraphs, where all the hyperedges have same cardinality, but still very challenging for general hypergraphs. After redefining the concept of adjacency in general hypergraphs, we propose a first e-adjacency tensor, that involves a Hypergraph Uniformisation Process and a Polynomial Homogenization Process. This is achieved by uniformisation of the original hypergraph by decomposing it into layers—each of them containing a uniform hypergraph—and filling each layer with additional special vertices and merging them together. This process requires to have as many additional vertices as the number of layers. In order to reduce the number of special vertices, we need to have the possibility of re- peating a vertex when filling, which is not possible with hyperedges as they are sets. We need multisets. It, therefore, requires a new mathematical structure, that we have intro- duced and called hyper-bag-graph—hb-graph for short—, which is a family of multisets of a given universe. Co-occurrences can also have repetitions or individual weighting of their vertices inside a given co-occurrence and hb-graphs fit to handle it. Hence, we introduce a hb-graph framework for co-occurrence networks. We then work on diffusion on such structures, using, in a first step, a matrix approach. Aggregating the ranking of vertices and hb- edges of this diffusion on each of the facet of the information space is achieved by using a multi-diffusion scheme. Since different facets might have different focus of interest, we introduce a biased diffusion that enables a tuning on the point of emphasis on the feature we are interested in. Finally, coming back to e-adjacency tensor, we propose three e-adjacency tensors of hb- graphs, that are based on different ways of filling the hb-edges. The m-uniformisation that is achieved is evaluated and compared to the ones achieved by hb-edge splitting, concluding that any m-uniformisation process has an influence on the exchange-based diffusion that we propose. Hence, we conclude that diffusion using the tensor approach must be done in an informed manner to account for this diffusion change. We finally discuss different possible of achieving it and present a new Laplacian that can help to achieve it.CERN-THESIS-2020-048oai:cds.cern.ch:27212162020-06-18T13:43:59Z
spellingShingle Computing and Computers
Mathematical Physics and Mathematics
Ouvrard, Xavier Eric
Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title_full Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title_fullStr Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title_full_unstemmed Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title_short Hyper-bag-graphs and their applications: Modeling, Analyzing and Visualizing Complex Networks of Co-occurrences
title_sort hyper-bag-graphs and their applications: modeling, analyzing and visualizing complex networks of co-occurrences
topic Computing and Computers
Mathematical Physics and Mathematics
url http://cds.cern.ch/record/2721216
work_keys_str_mv AT ouvrardxaviereric hyperbaggraphsandtheirapplicationsmodelinganalyzingandvisualizingcomplexnetworksofcooccurrences