Cargando…

Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data

Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions t...

Descripción completa

Detalles Bibliográficos
Autores principales: Mandviwalla, Aamir, Elsisy, Amr, Atique, Muhammad Saad, Kuzmin, Konstantin, Gaiteri, Chris, Szymanski, Boleslaw K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10453411/
https://www.ncbi.nlm.nih.gov/pubmed/37628148
http://dx.doi.org/10.3390/e25081118
_version_ 1785095929624264704
author Mandviwalla, Aamir
Elsisy, Amr
Atique, Muhammad Saad
Kuzmin, Konstantin
Gaiteri, Chris
Szymanski, Boleslaw K.
author_facet Mandviwalla, Aamir
Elsisy, Amr
Atique, Muhammad Saad
Kuzmin, Konstantin
Gaiteri, Chris
Szymanski, Boleslaw K.
author_sort Mandviwalla, Aamir
collection PubMed
description Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data.
format Online
Article
Text
id pubmed-10453411
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104534112023-08-26 Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data Mandviwalla, Aamir Elsisy, Amr Atique, Muhammad Saad Kuzmin, Konstantin Gaiteri, Chris Szymanski, Boleslaw K. Entropy (Basel) Article Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data. MDPI 2023-07-26 /pmc/articles/PMC10453411/ /pubmed/37628148 http://dx.doi.org/10.3390/e25081118 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Mandviwalla, Aamir
Elsisy, Amr
Atique, Muhammad Saad
Kuzmin, Konstantin
Gaiteri, Chris
Szymanski, Boleslaw K.
Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title_full Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title_fullStr Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title_full_unstemmed Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title_short Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
title_sort network analytics enabled by generating a pool of network variants from noisy data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10453411/
https://www.ncbi.nlm.nih.gov/pubmed/37628148
http://dx.doi.org/10.3390/e25081118
work_keys_str_mv AT mandviwallaaamir networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata
AT elsisyamr networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata
AT atiquemuhammadsaad networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata
AT kuzminkonstantin networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata
AT gaiterichris networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata
AT szymanskiboleslawk networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata