Cargando…

Resampling Effects on Significance Analysis of Network Clustering and Ranking

Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Mirshahvalad, Atieh, Beauchesne, Olivier H., Archambault, Éric, Rosvall, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553110/
https://www.ncbi.nlm.nih.gov/pubmed/23372677
http://dx.doi.org/10.1371/journal.pone.0053943
_version_ 1782256785894146048
author Mirshahvalad, Atieh
Beauchesne, Olivier H.
Archambault, Éric
Rosvall, Martin
author_facet Mirshahvalad, Atieh
Beauchesne, Olivier H.
Archambault, Éric
Rosvall, Martin
author_sort Mirshahvalad, Atieh
collection PubMed
description Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.
format Online
Article
Text
id pubmed-3553110
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35531102013-01-31 Resampling Effects on Significance Analysis of Network Clustering and Ranking Mirshahvalad, Atieh Beauchesne, Olivier H. Archambault, Éric Rosvall, Martin PLoS One Research Article Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change. Public Library of Science 2013-01-23 /pmc/articles/PMC3553110/ /pubmed/23372677 http://dx.doi.org/10.1371/journal.pone.0053943 Text en © 2013 Mirshahvalad et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mirshahvalad, Atieh
Beauchesne, Olivier H.
Archambault, Éric
Rosvall, Martin
Resampling Effects on Significance Analysis of Network Clustering and Ranking
title Resampling Effects on Significance Analysis of Network Clustering and Ranking
title_full Resampling Effects on Significance Analysis of Network Clustering and Ranking
title_fullStr Resampling Effects on Significance Analysis of Network Clustering and Ranking
title_full_unstemmed Resampling Effects on Significance Analysis of Network Clustering and Ranking
title_short Resampling Effects on Significance Analysis of Network Clustering and Ranking
title_sort resampling effects on significance analysis of network clustering and ranking
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3553110/
https://www.ncbi.nlm.nih.gov/pubmed/23372677
http://dx.doi.org/10.1371/journal.pone.0053943
work_keys_str_mv AT mirshahvaladatieh resamplingeffectsonsignificanceanalysisofnetworkclusteringandranking
AT beauchesneolivierh resamplingeffectsonsignificanceanalysisofnetworkclusteringandranking
AT archambaulteric resamplingeffectsonsignificanceanalysisofnetworkclusteringandranking
AT rosvallmartin resamplingeffectsonsignificanceanalysisofnetworkclusteringandranking