Cargando…

Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders

The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to charac...

Descripción completa

Detalles Bibliográficos
Autores principales: Rancati, Simone, Nicora, Giovanna, Prosperi, Mattia, Bellazzi, Riccardo, Marini, Simone, Salemi, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634784/
https://www.ncbi.nlm.nih.gov/pubmed/37961168
http://dx.doi.org/10.1101/2023.10.24.563721
_version_ 1785146240106758144
author Rancati, Simone
Nicora, Giovanna
Prosperi, Mattia
Bellazzi, Riccardo
Marini, Simone
Salemi, Marco
author_facet Rancati, Simone
Nicora, Giovanna
Prosperi, Mattia
Bellazzi, Riccardo
Marini, Simone
Salemi, Marco
author_sort Rancati, Simone
collection PubMed
description The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to characterize the evolutionary history of infection waves and genetic evolution but fall short in predicting future outlooks in promptly anticipating viral genetic alterations. To bridge this gap, we introduce a novel Deep learning, autoencoder-based method for anomaly detection in SARS-CoV-2 (DeepAutoCov). Trained and updated on the public global SARS-CoV-2 GISAID database. DeepAutoCov identifies Future Dominant Lineages (FDLs), defined as lineages comprising at least 25% of SARS-CoV-2 genomes added on a given week, on a weekly basis, using the Spike (S) protein. Our algorithm is grounded on anomaly detection via an unsupervised approach, which is necessary given that FDLs can be known only a posteriori (i.e., after they have become dominant). We developed two concurrent approaches (a linear unsupervised and a posteriori supervised) to evaluate DeepAutoCoV performance. DeepAutoCoV identifies FDL, using the spike (S) protein, with a median lead time of 31 weeks on global data and achieves a positive predictive value ~7x better and 23% higher than the other approaches. Furthermore, it predicts vaccine related FDLs up to 17 months in advance. Finally, DeepAutoCoV is not only predictive but also interpretable, since it can pinpoint specific mutations within FDLs, generating hypotheses on the potential increases in virulence or transmissibility of a lineage. By integrating genomic surveillance with artificial intelligence, our work marks a transformative step that may provide valuable insights for the optimization of public health prevention and intervention strategies.
format Online
Article
Text
id pubmed-10634784
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106347842023-11-13 Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders Rancati, Simone Nicora, Giovanna Prosperi, Mattia Bellazzi, Riccardo Marini, Simone Salemi, Marco bioRxiv Article The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to characterize the evolutionary history of infection waves and genetic evolution but fall short in predicting future outlooks in promptly anticipating viral genetic alterations. To bridge this gap, we introduce a novel Deep learning, autoencoder-based method for anomaly detection in SARS-CoV-2 (DeepAutoCov). Trained and updated on the public global SARS-CoV-2 GISAID database. DeepAutoCov identifies Future Dominant Lineages (FDLs), defined as lineages comprising at least 25% of SARS-CoV-2 genomes added on a given week, on a weekly basis, using the Spike (S) protein. Our algorithm is grounded on anomaly detection via an unsupervised approach, which is necessary given that FDLs can be known only a posteriori (i.e., after they have become dominant). We developed two concurrent approaches (a linear unsupervised and a posteriori supervised) to evaluate DeepAutoCoV performance. DeepAutoCoV identifies FDL, using the spike (S) protein, with a median lead time of 31 weeks on global data and achieves a positive predictive value ~7x better and 23% higher than the other approaches. Furthermore, it predicts vaccine related FDLs up to 17 months in advance. Finally, DeepAutoCoV is not only predictive but also interpretable, since it can pinpoint specific mutations within FDLs, generating hypotheses on the potential increases in virulence or transmissibility of a lineage. By integrating genomic surveillance with artificial intelligence, our work marks a transformative step that may provide valuable insights for the optimization of public health prevention and intervention strategies. Cold Spring Harbor Laboratory 2023-10-24 /pmc/articles/PMC10634784/ /pubmed/37961168 http://dx.doi.org/10.1101/2023.10.24.563721 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Rancati, Simone
Nicora, Giovanna
Prosperi, Mattia
Bellazzi, Riccardo
Marini, Simone
Salemi, Marco
Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title_full Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title_fullStr Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title_full_unstemmed Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title_short Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
title_sort forecasting dominance of sars-cov-2 lineages by anomaly detection using deep autoencoders
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634784/
https://www.ncbi.nlm.nih.gov/pubmed/37961168
http://dx.doi.org/10.1101/2023.10.24.563721
work_keys_str_mv AT rancatisimone forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders
AT nicoragiovanna forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders
AT prosperimattia forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders
AT bellazziriccardo forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders
AT marinisimone forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders
AT salemimarco forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders