Cargando…

Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization

We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the developm...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Zhengqiao, Sokhansanj, Bahrad A., Malhotra, Charvi, Zheng, Kitty, Rosen, Gail L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523987/
https://www.ncbi.nlm.nih.gov/pubmed/32941419
http://dx.doi.org/10.1371/journal.pcbi.1008269
_version_ 1783588465814798336
author Zhao, Zhengqiao
Sokhansanj, Bahrad A.
Malhotra, Charvi
Zheng, Kitty
Rosen, Gail L.
author_facet Zhao, Zhengqiao
Sokhansanj, Bahrad A.
Malhotra, Charvi
Zheng, Kitty
Rosen, Gail L.
author_sort Zhao, Zhengqiao
collection PubMed
description We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/.
format Online
Article
Text
id pubmed-7523987
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75239872020-10-06 Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization Zhao, Zhengqiao Sokhansanj, Bahrad A. Malhotra, Charvi Zheng, Kitty Rosen, Gail L. PLoS Comput Biol Research Article We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies. However, identifying viral subtypes in real-time is challenging: SARS-CoV-2 is a novel virus, and the pandemic is rapidly expanding. Viral subtypes may be difficult to detect due to rapid evolution; founder effects are more significant than selection pressure; and the clustering threshold for subtyping is not standardized. We propose to identify mutational signatures of available SARS-CoV-2 sequences using a population-based approach: an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Through ISM compression, we find that certain distant nucleotide variants covary, including non-coding and ORF1ab sites covarying with the D614G spike protein mutation which has become increasingly prevalent as the pandemic has spread. ISMs are also useful for downstream analyses, such as spatiotemporal visualization of viral dynamics. By analyzing sequence data available in the GISAID database, we validate the utility of ISM-based subtyping by comparing spatiotemporal analyses using ISMs to epidemiological studies of viral transmission in Asia, Europe, and the United States. In addition, we show the relationship of ISMs to phylogenetic reconstructions of SARS-CoV-2 evolution, and therefore, ISMs can play an important complementary role to phylogenetic tree-based analysis, such as is done in the Nextstrain project. The developed pipeline dynamically generates ISMs for newly added SARS-CoV-2 sequences and updates the visualization of pandemic spatiotemporal dynamics, and is available on Github at https://github.com/EESI/ISM (Jupyter notebook), https://github.com/EESI/ncov_ism (command line tool) and via an interactive website at https://covid19-ism.coe.drexel.edu/. Public Library of Science 2020-09-17 /pmc/articles/PMC7523987/ /pubmed/32941419 http://dx.doi.org/10.1371/journal.pcbi.1008269 Text en © 2020 Zhao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhao, Zhengqiao
Sokhansanj, Bahrad A.
Malhotra, Charvi
Zheng, Kitty
Rosen, Gail L.
Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title_full Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title_fullStr Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title_full_unstemmed Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title_short Genetic grouping of SARS-CoV-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
title_sort genetic grouping of sars-cov-2 coronavirus sequences using informative subtype markers for pandemic spread visualization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523987/
https://www.ncbi.nlm.nih.gov/pubmed/32941419
http://dx.doi.org/10.1371/journal.pcbi.1008269
work_keys_str_mv AT zhaozhengqiao geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization
AT sokhansanjbahrada geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization
AT malhotracharvi geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization
AT zhengkitty geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization
AT rosengaill geneticgroupingofsarscov2coronavirussequencesusinginformativesubtypemarkersforpandemicspreadvisualization