Cargando…
Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing
BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning rand...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781309/ https://www.ncbi.nlm.nih.gov/pubmed/33398274 http://dx.doi.org/10.1101/2020.12.28.424582 |
_version_ | 1783631650746269696 |
---|---|
author | Moustafa, Ahmed M. Planet, Paul J. |
author_facet | Moustafa, Ahmed M. Planet, Paul J. |
author_sort | Moustafa, Ahmed M. |
collection | PubMed |
description | BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20(th) 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states. RESULTS: GNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed. CONCLUSIONS: Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots. |
format | Online Article Text |
id | pubmed-7781309 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-77813092021-01-05 Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing Moustafa, Ahmed M. Planet, Paul J. bioRxiv Article BACKGROUND: Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. METHODS: We developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20(th) 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states. RESULTS: GNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed. CONCLUSIONS: Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots. Cold Spring Harbor Laboratory 2020-12-28 /pmc/articles/PMC7781309/ /pubmed/33398274 http://dx.doi.org/10.1101/2020.12.28.424582 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Moustafa, Ahmed M. Planet, Paul J. Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title | Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title_full | Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title_fullStr | Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title_full_unstemmed | Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title_short | Emerging SARS-CoV-2 diversity revealed by rapid whole genome sequence typing |
title_sort | emerging sars-cov-2 diversity revealed by rapid whole genome sequence typing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7781309/ https://www.ncbi.nlm.nih.gov/pubmed/33398274 http://dx.doi.org/10.1101/2020.12.28.424582 |
work_keys_str_mv | AT moustafaahmedm emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping AT planetpaulj emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping |