Cargando…

A deep learning approach to real-time HIV outbreak detection using genetic data

Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to err...

Descripción completa

Detalles Bibliográficos
Autores principales: Kupperman, Michael D., Leitner, Thomas, Ke, Ruian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604978/
https://www.ncbi.nlm.nih.gov/pubmed/36240224
http://dx.doi.org/10.1371/journal.pcbi.1010598
_version_ 1784817951263686656
author Kupperman, Michael D.
Leitner, Thomas
Ke, Ruian
author_facet Kupperman, Michael D.
Leitner, Thomas
Ke, Ruian
author_sort Kupperman, Michael D.
collection PubMed
description Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands. Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R(0) ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.
format Online
Article
Text
id pubmed-9604978
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96049782022-10-27 A deep learning approach to real-time HIV outbreak detection using genetic data Kupperman, Michael D. Leitner, Thomas Ke, Ruian PLoS Comput Biol Research Article Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands. Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R(0) ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification. Public Library of Science 2022-10-14 /pmc/articles/PMC9604978/ /pubmed/36240224 http://dx.doi.org/10.1371/journal.pcbi.1010598 Text en © 2022 Kupperman et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kupperman, Michael D.
Leitner, Thomas
Ke, Ruian
A deep learning approach to real-time HIV outbreak detection using genetic data
title A deep learning approach to real-time HIV outbreak detection using genetic data
title_full A deep learning approach to real-time HIV outbreak detection using genetic data
title_fullStr A deep learning approach to real-time HIV outbreak detection using genetic data
title_full_unstemmed A deep learning approach to real-time HIV outbreak detection using genetic data
title_short A deep learning approach to real-time HIV outbreak detection using genetic data
title_sort deep learning approach to real-time hiv outbreak detection using genetic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604978/
https://www.ncbi.nlm.nih.gov/pubmed/36240224
http://dx.doi.org/10.1371/journal.pcbi.1010598
work_keys_str_mv AT kuppermanmichaeld adeeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata
AT leitnerthomas adeeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata
AT keruian adeeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata
AT kuppermanmichaeld deeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata
AT leitnerthomas deeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata
AT keruian deeplearningapproachtorealtimehivoutbreakdetectionusinggeneticdata