Cargando…

From components to communities: bringing network science to clustering for molecular epidemiology

Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Molly, Chato, Connor, Poon, Art F Y
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10175948/
https://www.ncbi.nlm.nih.gov/pubmed/37187604
http://dx.doi.org/10.1093/ve/vead026
_version_ 1785040326469091328
author Liu, Molly
Chato, Connor
Poon, Art F Y
author_facet Liu, Molly
Chato, Connor
Poon, Art F Y
author_sort Liu, Molly
collection PubMed
description Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work.
format Online
Article
Text
id pubmed-10175948
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101759482023-05-13 From components to communities: bringing network science to clustering for molecular epidemiology Liu, Molly Chato, Connor Poon, Art F Y Virus Evol Reflections Defining clusters of epidemiologically related infections is a common problem in the surveillance of infectious disease. A popular method for generating clusters is pairwise distance clustering, which assigns pairs of sequences to the same cluster if their genetic distance falls below some threshold. The result is often represented as a network or graph of nodes. A connected component is a set of interconnected nodes in a graph that are not connected to any other node. The prevailing approach to pairwise clustering is to map clusters to the connected components of the graph on a one-to-one basis. We propose that this definition of clusters is unnecessarily rigid. For instance, the connected components can collapse into one cluster by the addition of a single sequence that bridges nodes in the respective components. Moreover, the distance thresholds typically used for viruses like HIV-1 tend to exclude a large proportion of new sequences, making it difficult to train models for predicting cluster growth. These issues may be resolved by revisiting how we define clusters from genetic distances. Community detection is a promising class of clustering methods from the field of network science. A community is a set of nodes that are more densely inter-connected relative to the number of their connections to external nodes. Thus, a connected component may be partitioned into two or more communities. Here we describe community detection methods in the context of genetic clustering for epidemiology, demonstrate how a popular method (Markov clustering) enables us to resolve variation in transmission rates within a giant connected component of HIV-1 sequences, and identify current challenges and directions for further work. Oxford University Press 2023-04-25 /pmc/articles/PMC10175948/ /pubmed/37187604 http://dx.doi.org/10.1093/ve/vead026 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Reflections
Liu, Molly
Chato, Connor
Poon, Art F Y
From components to communities: bringing network science to clustering for molecular epidemiology
title From components to communities: bringing network science to clustering for molecular epidemiology
title_full From components to communities: bringing network science to clustering for molecular epidemiology
title_fullStr From components to communities: bringing network science to clustering for molecular epidemiology
title_full_unstemmed From components to communities: bringing network science to clustering for molecular epidemiology
title_short From components to communities: bringing network science to clustering for molecular epidemiology
title_sort from components to communities: bringing network science to clustering for molecular epidemiology
topic Reflections
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10175948/
https://www.ncbi.nlm.nih.gov/pubmed/37187604
http://dx.doi.org/10.1093/ve/vead026
work_keys_str_mv AT liumolly fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology
AT chatoconnor fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology
AT poonartfy fromcomponentstocommunitiesbringingnetworksciencetoclusteringformolecularepidemiology