Cargando…
Comparison of large networks with sub-sampling strategies
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Mos...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933923/ https://www.ncbi.nlm.nih.gov/pubmed/27380992 http://dx.doi.org/10.1038/srep28955 |
_version_ | 1782441250024062976 |
---|---|
author | Ali, Waqar Wegner, Anatol E. Gaunt, Robert E. Deane, Charlotte M. Reinert, Gesine |
author_facet | Ali, Waqar Wegner, Anatol E. Gaunt, Robert E. Deane, Charlotte M. Reinert, Gesine |
author_sort | Ali, Waqar |
collection | PubMed |
description | Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets. |
format | Online Article Text |
id | pubmed-4933923 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-49339232016-07-08 Comparison of large networks with sub-sampling strategies Ali, Waqar Wegner, Anatol E. Gaunt, Robert E. Deane, Charlotte M. Reinert, Gesine Sci Rep Article Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets. Nature Publishing Group 2016-07-06 /pmc/articles/PMC4933923/ /pubmed/27380992 http://dx.doi.org/10.1038/srep28955 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Ali, Waqar Wegner, Anatol E. Gaunt, Robert E. Deane, Charlotte M. Reinert, Gesine Comparison of large networks with sub-sampling strategies |
title | Comparison of large networks with sub-sampling strategies |
title_full | Comparison of large networks with sub-sampling strategies |
title_fullStr | Comparison of large networks with sub-sampling strategies |
title_full_unstemmed | Comparison of large networks with sub-sampling strategies |
title_short | Comparison of large networks with sub-sampling strategies |
title_sort | comparison of large networks with sub-sampling strategies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4933923/ https://www.ncbi.nlm.nih.gov/pubmed/27380992 http://dx.doi.org/10.1038/srep28955 |
work_keys_str_mv | AT aliwaqar comparisonoflargenetworkswithsubsamplingstrategies AT wegneranatole comparisonoflargenetworkswithsubsamplingstrategies AT gauntroberte comparisonoflargenetworkswithsubsamplingstrategies AT deanecharlottem comparisonoflargenetworkswithsubsamplingstrategies AT reinertgesine comparisonoflargenetworkswithsubsamplingstrategies |