Cargando…

Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing

BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning rand...

Descripción completa

Detalles Bibliográficos
Autores principales: Moustafa, Ahmed M., Planet, Paul J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781309/
https://www.ncbi.nlm.nih.gov/pubmed/33398274
http://dx.doi.org/10.1101/2020.12.28.424582
_version_ 1783631650746269696
author Moustafa, Ahmed M.
Planet, Paul J.
author_facet Moustafa, Ahmed M.
Planet, Paul J.
author_sort Moustafa, Ahmed M.
collection PubMed
description BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20(th) 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states. RESULTS: GNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed. CONCLUSIONS: Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots.
format Online
Article
Text
id pubmed-7781309
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-77813092021-01-05 Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing Moustafa, Ahmed M. Planet, Paul J. bioRxiv Article BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20(th) 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states. RESULTS: GNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed. CONCLUSIONS: Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots. Cold Spring Harbor Laboratory 2020-12-28 /pmc/articles/PMC7781309/ /pubmed/33398274 http://dx.doi.org/10.1101/2020.12.28.424582 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Moustafa, Ahmed M.
Planet, Paul J.
Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title_full Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title_fullStr Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title_full_unstemmed Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title_short Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
title_sort emerging sars-cov-2 diversity revealed by rapid whole genome sequence typing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781309/
https://www.ncbi.nlm.nih.gov/pubmed/33398274
http://dx.doi.org/10.1101/2020.12.28.424582
work_keys_str_mv AT moustafaahmedm emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping
AT planetpaulj emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping