Cargando…
Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders
The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to charac...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634784/ https://www.ncbi.nlm.nih.gov/pubmed/37961168 http://dx.doi.org/10.1101/2023.10.24.563721 |
_version_ | 1785146240106758144 |
---|---|
author | Rancati, Simone Nicora, Giovanna Prosperi, Mattia Bellazzi, Riccardo Marini, Simone Salemi, Marco |
author_facet | Rancati, Simone Nicora, Giovanna Prosperi, Mattia Bellazzi, Riccardo Marini, Simone Salemi, Marco |
author_sort | Rancati, Simone |
collection | PubMed |
description | The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to characterize the evolutionary history of infection waves and genetic evolution but fall short in predicting future outlooks in promptly anticipating viral genetic alterations. To bridge this gap, we introduce a novel Deep learning, autoencoder-based method for anomaly detection in SARS-CoV-2 (DeepAutoCov). Trained and updated on the public global SARS-CoV-2 GISAID database. DeepAutoCov identifies Future Dominant Lineages (FDLs), defined as lineages comprising at least 25% of SARS-CoV-2 genomes added on a given week, on a weekly basis, using the Spike (S) protein. Our algorithm is grounded on anomaly detection via an unsupervised approach, which is necessary given that FDLs can be known only a posteriori (i.e., after they have become dominant). We developed two concurrent approaches (a linear unsupervised and a posteriori supervised) to evaluate DeepAutoCoV performance. DeepAutoCoV identifies FDL, using the spike (S) protein, with a median lead time of 31 weeks on global data and achieves a positive predictive value ~7x better and 23% higher than the other approaches. Furthermore, it predicts vaccine related FDLs up to 17 months in advance. Finally, DeepAutoCoV is not only predictive but also interpretable, since it can pinpoint specific mutations within FDLs, generating hypotheses on the potential increases in virulence or transmissibility of a lineage. By integrating genomic surveillance with artificial intelligence, our work marks a transformative step that may provide valuable insights for the optimization of public health prevention and intervention strategies. |
format | Online Article Text |
id | pubmed-10634784 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106347842023-11-13 Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders Rancati, Simone Nicora, Giovanna Prosperi, Mattia Bellazzi, Riccardo Marini, Simone Salemi, Marco bioRxiv Article The COVID-19 pandemic exemplified the need for a rapid, effective genomic-based surveillance system to predict emerging SARS-CoV-2 variants and lineages. Traditional molecular epidemiology methods, which leverage public health surveillance or integrated sequence data repositories, are able to characterize the evolutionary history of infection waves and genetic evolution but fall short in predicting future outlooks in promptly anticipating viral genetic alterations. To bridge this gap, we introduce a novel Deep learning, autoencoder-based method for anomaly detection in SARS-CoV-2 (DeepAutoCov). Trained and updated on the public global SARS-CoV-2 GISAID database. DeepAutoCov identifies Future Dominant Lineages (FDLs), defined as lineages comprising at least 25% of SARS-CoV-2 genomes added on a given week, on a weekly basis, using the Spike (S) protein. Our algorithm is grounded on anomaly detection via an unsupervised approach, which is necessary given that FDLs can be known only a posteriori (i.e., after they have become dominant). We developed two concurrent approaches (a linear unsupervised and a posteriori supervised) to evaluate DeepAutoCoV performance. DeepAutoCoV identifies FDL, using the spike (S) protein, with a median lead time of 31 weeks on global data and achieves a positive predictive value ~7x better and 23% higher than the other approaches. Furthermore, it predicts vaccine related FDLs up to 17 months in advance. Finally, DeepAutoCoV is not only predictive but also interpretable, since it can pinpoint specific mutations within FDLs, generating hypotheses on the potential increases in virulence or transmissibility of a lineage. By integrating genomic surveillance with artificial intelligence, our work marks a transformative step that may provide valuable insights for the optimization of public health prevention and intervention strategies. Cold Spring Harbor Laboratory 2023-10-24 /pmc/articles/PMC10634784/ /pubmed/37961168 http://dx.doi.org/10.1101/2023.10.24.563721 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Rancati, Simone Nicora, Giovanna Prosperi, Mattia Bellazzi, Riccardo Marini, Simone Salemi, Marco Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title | Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title_full | Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title_fullStr | Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title_full_unstemmed | Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title_short | Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders |
title_sort | forecasting dominance of sars-cov-2 lineages by anomaly detection using deep autoencoders |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634784/ https://www.ncbi.nlm.nih.gov/pubmed/37961168 http://dx.doi.org/10.1101/2023.10.24.563721 |
work_keys_str_mv | AT rancatisimone forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders AT nicoragiovanna forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders AT prosperimattia forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders AT bellazziriccardo forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders AT marinisimone forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders AT salemimarco forecastingdominanceofsarscov2lineagesbyanomalydetectionusingdeepautoencoders |