Cargando…
Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data
Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions t...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10453411/ https://www.ncbi.nlm.nih.gov/pubmed/37628148 http://dx.doi.org/10.3390/e25081118 |
_version_ | 1785095929624264704 |
---|---|
author | Mandviwalla, Aamir Elsisy, Amr Atique, Muhammad Saad Kuzmin, Konstantin Gaiteri, Chris Szymanski, Boleslaw K. |
author_facet | Mandviwalla, Aamir Elsisy, Amr Atique, Muhammad Saad Kuzmin, Konstantin Gaiteri, Chris Szymanski, Boleslaw K. |
author_sort | Mandviwalla, Aamir |
collection | PubMed |
description | Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data. |
format | Online Article Text |
id | pubmed-10453411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-104534112023-08-26 Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data Mandviwalla, Aamir Elsisy, Amr Atique, Muhammad Saad Kuzmin, Konstantin Gaiteri, Chris Szymanski, Boleslaw K. Entropy (Basel) Article Mapping network nodes and edges to communities and network functions is crucial to gaining a higher level of understanding of the network structure and functions. Such mappings are particularly challenging to design for covert social networks, which intentionally hide their structure and functions to protect important members from attacks or arrests. Here, we focus on correctly inferring the structures and functions of such networks, but our methodology can be broadly applied. Without the ground truth, knowledge about the allocation of nodes to communities and network functions, no single network based on the noisy data can represent all plausible communities and functions of the true underlying network. To address this limitation, we apply a generative model that randomly distorts the original network based on the noisy data, generating a pool of statistically equivalent networks. Each unique generated network is recorded, while each duplicate of the already recorded network just increases the repetition count of that network. We treat each such network as a variant of the ground truth with the probability of arising in the real world approximated by the ratio of the count of this network’s duplicates plus one to the total number of all generated networks. Communities of variants with frequently occurring duplicates contain persistent patterns shared by their structures. Using Shannon entropy, we can find a variant that minimizes the uncertainty for operations planned on the network. Repeatedly generating new pools of networks from the best network of the previous step for several steps lowers the entropy of the best new variant. If the entropy is too high, the network operators can identify nodes, the monitoring of which can achieve the most significant reduction in entropy. Finally, we also present a heuristic for constructing a new variant, which is not randomly generated but has the lowest expected cost of operating on the distorted mappings of network nodes to communities and functions caused by noisy data. MDPI 2023-07-26 /pmc/articles/PMC10453411/ /pubmed/37628148 http://dx.doi.org/10.3390/e25081118 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Mandviwalla, Aamir Elsisy, Amr Atique, Muhammad Saad Kuzmin, Konstantin Gaiteri, Chris Szymanski, Boleslaw K. Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title | Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title_full | Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title_fullStr | Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title_full_unstemmed | Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title_short | Network Analytics Enabled by Generating a Pool of Network Variants from Noisy Data |
title_sort | network analytics enabled by generating a pool of network variants from noisy data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10453411/ https://www.ncbi.nlm.nih.gov/pubmed/37628148 http://dx.doi.org/10.3390/e25081118 |
work_keys_str_mv | AT mandviwallaaamir networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata AT elsisyamr networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata AT atiquemuhammadsaad networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata AT kuzminkonstantin networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata AT gaiterichris networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata AT szymanskiboleslawk networkanalyticsenabledbygeneratingapoolofnetworkvariantsfromnoisydata |