Cargando…
Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the developm...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523987/ https://www.ncbi.nlm.nih.gov/pubmed/32941419 http://dx.doi.org/10.1371/journal.pcbi.1008269 |
_version_ | 1783588465814798336 |
---|---|
author | Zhao, Zhengqiao Sokhansanj, Bahrad A. Malhotra, Charvi Zheng, Kitty Rosen, Gail L. |
author_facet | Zhao, Zhengqiao Sokhansanj, Bahrad A. Malhotra, Charvi Zheng, Kitty Rosen, Gail L. |
author_sort | Zhao, Zhengqiao |
collection | PubMed |
description | We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/. |
format | Online Article Text |
id | pubmed-7523987 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-75239872020-10-06 Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization Zhao, Zhengqiao Sokhansanj, Bahrad A. Malhotra, Charvi Zheng, Kitty Rosen, Gail L. PLoS Comput Biol Research Article We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/. Public Library of Science 2020-09-17 /pmc/articles/PMC7523987/ /pubmed/32941419 http://dx.doi.org/10.1371/journal.pcbi.1008269 Text en © 2020 Zhao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Zhao, Zhengqiao Sokhansanj, Bahrad A. Malhotra, Charvi Zheng, Kitty Rosen, Gail L. Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title | Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title_full | Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title_fullStr | Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title_full_unstemmed | Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title_short | Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
title_sort | genetic grouping of sars-cov-2 coronavirus sequences using informative subtype markers for pandemic spread visualization |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523987/ https://www.ncbi.nlm.nih.gov/pubmed/32941419 http://dx.doi.org/10.1371/journal.pcbi.1008269 |
work_keys_str_mv | AT zhaozhengqiao geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization AT sokhansanjbahrada geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization AT malhotracharvi geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization AT zhengkitty geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization AT rosengaill geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization |