Cargando…
From components to communities: bringing network science to clustering for molecular epidemiology
Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10175948/ https://www.ncbi.nlm.nih.gov/pubmed/37187604 http://dx.doi.org/10.1093/ve/vead026 |
_version_ | 1785040326469091328 |
---|---|
author | Liu, Molly Chato, Connor Poon, Art F Y |
author_facet | Liu, Molly Chato, Connor Poon, Art F Y |
author_sort | Liu, Molly |
collection | PubMed |
description | Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work. |
format | Online Article Text |
id | pubmed-10175948 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101759482023-05-13 From components to communities: bringing network science to clustering for molecular epidemiology Liu, Molly Chato, Connor Poon, Art F Y Virus Evol Reflections Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work. Oxford University Press 2023-04-25 /pmc/articles/PMC10175948/ /pubmed/37187604 http://dx.doi.org/10.1093/ve/vead026 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Reflections Liu, Molly Chato, Connor Poon, Art F Y From components to communities: bringing network science to clustering for molecular epidemiology |
title | From components to communities: bringing network science to clustering for molecular epidemiology |
title_full | From components to communities: bringing network science to clustering for molecular epidemiology |
title_fullStr | From components to communities: bringing network science to clustering for molecular epidemiology |
title_full_unstemmed | From components to communities: bringing network science to clustering for molecular epidemiology |
title_short | From components to communities: bringing network science to clustering for molecular epidemiology |
title_sort | from components to communities: bringing network science to clustering for molecular epidemiology |
topic | Reflections |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10175948/ https://www.ncbi.nlm.nih.gov/pubmed/37187604 http://dx.doi.org/10.1093/ve/vead026 |
work_keys_str_mv | AT liumolly fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology AT chatoconnor fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology AT poonartfy fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology |