Cargando…

Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant

We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Levinstein Hallak, Keren, Rosset, Saharon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964801/
https://www.ncbi.nlm.nih.gov/pubmed/35351970
http://dx.doi.org/10.1038/s42003-022-03198-y
_version_ 1784678299527544832
author Levinstein Hallak, Keren
Rosset, Saharon
author_facet Levinstein Hallak, Keren
Rosset, Saharon
author_sort Levinstein Hallak, Keren
collection PubMed
description We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19.
format Online
Article
Text
id pubmed-8964801
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89648012022-04-20 Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant Levinstein Hallak, Keren Rosset, Saharon Commun Biol Article We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19. Nature Publishing Group UK 2022-03-29 /pmc/articles/PMC8964801/ /pubmed/35351970 http://dx.doi.org/10.1038/s42003-022-03198-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Levinstein Hallak, Keren
Rosset, Saharon
Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title_full Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title_fullStr Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title_full_unstemmed Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title_short Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
title_sort statistical modeling of sars-cov-2 substitution processes: predicting the next variant
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964801/
https://www.ncbi.nlm.nih.gov/pubmed/35351970
http://dx.doi.org/10.1038/s42003-022-03198-y
work_keys_str_mv AT levinsteinhallakkeren statisticalmodelingofsarscov2substitutionprocessespredictingthenextvariant
AT rossetsaharon statisticalmodelingofsarscov2substitutionprocessespredictingthenextvariant