Cargando…

The ground truth about metadata and community detection in networks

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is...

Descripción completa

Detalles Bibliográficos
Autores principales: Peel, Leto, Larremore, Daniel B., Clauset, Aaron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415338/
https://www.ncbi.nlm.nih.gov/pubmed/28508065
http://dx.doi.org/10.1126/sciadv.1602548
_version_ 1783233506517712896
author Peel, Leto
Larremore, Daniel B.
Clauset, Aaron
author_facet Peel, Leto
Larremore, Daniel B.
Clauset, Aaron
author_sort Peel, Leto
collection PubMed
description Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
format Online
Article
Text
id pubmed-5415338
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-54153382017-05-15 The ground truth about metadata and community detection in networks Peel, Leto Larremore, Daniel B. Clauset, Aaron Sci Adv Research Articles Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. American Association for the Advancement of Science 2017-05-03 /pmc/articles/PMC5415338/ /pubmed/28508065 http://dx.doi.org/10.1126/sciadv.1602548 Text en Copyright © 2017, The Authors http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Research Articles
Peel, Leto
Larremore, Daniel B.
Clauset, Aaron
The ground truth about metadata and community detection in networks
title The ground truth about metadata and community detection in networks
title_full The ground truth about metadata and community detection in networks
title_fullStr The ground truth about metadata and community detection in networks
title_full_unstemmed The ground truth about metadata and community detection in networks
title_short The ground truth about metadata and community detection in networks
title_sort ground truth about metadata and community detection in networks
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415338/
https://www.ncbi.nlm.nih.gov/pubmed/28508065
http://dx.doi.org/10.1126/sciadv.1602548
work_keys_str_mv AT peelleto thegroundtruthaboutmetadataandcommunitydetectioninnetworks
AT larremoredanielb thegroundtruthaboutmetadataandcommunitydetectioninnetworks
AT clausetaaron thegroundtruthaboutmetadataandcommunitydetectioninnetworks
AT peelleto groundtruthaboutmetadataandcommunitydetectioninnetworks
AT larremoredanielb groundtruthaboutmetadataandcommunitydetectioninnetworks
AT clausetaaron groundtruthaboutmetadataandcommunitydetectioninnetworks