Cargando…
The ground truth about metadata and community detection in networks
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Association for the Advancement of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415338/ https://www.ncbi.nlm.nih.gov/pubmed/28508065 http://dx.doi.org/10.1126/sciadv.1602548 |
_version_ | 1783233506517712896 |
---|---|
author | Peel, Leto Larremore, Daniel B. Clauset, Aaron |
author_facet | Peel, Leto Larremore, Daniel B. Clauset, Aaron |
author_sort | Peel, Leto |
collection | PubMed |
description | Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. |
format | Online Article Text |
id | pubmed-5415338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | American Association for the Advancement of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-54153382017-05-15 The ground truth about metadata and community detection in networks Peel, Leto Larremore, Daniel B. Clauset, Aaron Sci Adv Research Articles Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. American Association for the Advancement of Science 2017-05-03 /pmc/articles/PMC5415338/ /pubmed/28508065 http://dx.doi.org/10.1126/sciadv.1602548 Text en Copyright © 2017, The Authors http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited. |
spellingShingle | Research Articles Peel, Leto Larremore, Daniel B. Clauset, Aaron The ground truth about metadata and community detection in networks |
title | The ground truth about metadata and community detection in networks |
title_full | The ground truth about metadata and community detection in networks |
title_fullStr | The ground truth about metadata and community detection in networks |
title_full_unstemmed | The ground truth about metadata and community detection in networks |
title_short | The ground truth about metadata and community detection in networks |
title_sort | ground truth about metadata and community detection in networks |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415338/ https://www.ncbi.nlm.nih.gov/pubmed/28508065 http://dx.doi.org/10.1126/sciadv.1602548 |
work_keys_str_mv | AT peelleto thegroundtruthaboutmetadataandcommunitydetectioninnetworks AT larremoredanielb thegroundtruthaboutmetadataandcommunitydetectioninnetworks AT clausetaaron thegroundtruthaboutmetadataandcommunitydetectioninnetworks AT peelleto groundtruthaboutmetadataandcommunitydetectioninnetworks AT larremoredanielb groundtruthaboutmetadataandcommunitydetectioninnetworks AT clausetaaron groundtruthaboutmetadataandcommunitydetectioninnetworks |