Cargando…

A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation

Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for...

Descripción completa

Detalles Bibliográficos
Autores principales: McCloskey, Rosemary M., Poon, Art F. Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703573/
https://www.ncbi.nlm.nih.gov/pubmed/29131825
http://dx.doi.org/10.1371/journal.pcbi.1005868
_version_ 1783281708584402944
author McCloskey, Rosemary M.
Poon, Art F. Y.
author_facet McCloskey, Rosemary M.
Poon, Art F. Y.
author_sort McCloskey, Rosemary M.
collection PubMed
description Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis—where individuals are sampled sooner post-infection—rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.
format Online
Article
Text
id pubmed-5703573
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57035732017-12-08 A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation McCloskey, Rosemary M. Poon, Art F. Y. PLoS Comput Biol Research Article Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis—where individuals are sampled sooner post-infection—rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources. Public Library of Science 2017-11-13 /pmc/articles/PMC5703573/ /pubmed/29131825 http://dx.doi.org/10.1371/journal.pcbi.1005868 Text en © 2017 McCloskey, Poon http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
McCloskey, Rosemary M.
Poon, Art F. Y.
A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title_full A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title_fullStr A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title_full_unstemmed A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title_short A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
title_sort model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703573/
https://www.ncbi.nlm.nih.gov/pubmed/29131825
http://dx.doi.org/10.1371/journal.pcbi.1005868
work_keys_str_mv AT mccloskeyrosemarym amodelbasedclusteringmethodtodetectinfectiousdiseasetransmissionoutbreaksfromsequencevariation
AT poonartfy amodelbasedclusteringmethodtodetectinfectiousdiseasetransmissionoutbreaksfromsequencevariation
AT mccloskeyrosemarym modelbasedclusteringmethodtodetectinfectiousdiseasetransmissionoutbreaksfromsequencevariation
AT poonartfy modelbasedclusteringmethodtodetectinfectiousdiseasetransmissionoutbreaksfromsequencevariation