Cargando…
Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant
We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to pre...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964801/ https://www.ncbi.nlm.nih.gov/pubmed/35351970 http://dx.doi.org/10.1038/s42003-022-03198-y |
_version_ | 1784678299527544832 |
---|---|
author | Levinstein Hallak, Keren Rosset, Saharon |
author_facet | Levinstein Hallak, Keren Rosset, Saharon |
author_sort | Levinstein Hallak, Keren |
collection | PubMed |
description | We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19. |
format | Online Article Text |
id | pubmed-8964801 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-89648012022-04-20 Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant Levinstein Hallak, Keren Rosset, Saharon Commun Biol Article We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19. Nature Publishing Group UK 2022-03-29 /pmc/articles/PMC8964801/ /pubmed/35351970 http://dx.doi.org/10.1038/s42003-022-03198-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Levinstein Hallak, Keren Rosset, Saharon Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title | Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title_full | Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title_fullStr | Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title_full_unstemmed | Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title_short | Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant |
title_sort | statistical modeling of sars-cov-2 substitution processes: predicting the next variant |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8964801/ https://www.ncbi.nlm.nih.gov/pubmed/35351970 http://dx.doi.org/10.1038/s42003-022-03198-y |
work_keys_str_mv | AT levinsteinhallakkeren statisticalmodelingofsarscov2substitutionprocessespredictingthenextvariant AT rossetsaharon statisticalmodelingofsarscov2substitutionprocessespredictingthenextvariant |