Cargando…

Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing

Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. We developed a tool (GNU-based Virus IDentification [GNUVID]) that integrates whole-genome multilocus sequence typing and a supervised machine l...

Descripción completa

Detalles Bibliográficos
Autores principales: Moustafa, Ahmed M, Planet, Paul J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449825/
https://www.ncbi.nlm.nih.gov/pubmed/34432021
http://dx.doi.org/10.1093/gbe/evab197
_version_ 1784569494486646784
author Moustafa, Ahmed M
Planet, Paul J
author_facet Moustafa, Ahmed M
Planet, Paul J
author_sort Moustafa, Ahmed M
collection PubMed
description Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. We developed a tool (GNU-based Virus IDentification [GNUVID]) that integrates whole-genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to all high-quality genomes available from GISAID. STs were clustered into clonal complexes (CCs) and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events and to estimate effective viral diversity across locations and over time in 16 US states. GNUVID is a highly scalable tool for viral genotype classification (https://github.com/ahmedmagds/GNUVID) that can quickly classify hundreds of thousands of genomes in a way that is consistent with phylogeny. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states, an average of 20.6 putative introductions and 7.5 exportations for each state over the time period analyzed. We introduce the use of effective diversity metrics (Hill numbers) that can be used to estimate the impact of interventions (e.g., travel restrictions, vaccine uptake, mask mandates) on the variation in circulating viruses. Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. GNUVID classification lends itself to measures of ecological diversity, and, with systematic genomic sampling, it could be used to track circulating viral diversity and identify emerging clones and hotspots.
format Online
Article
Text
id pubmed-8449825
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84498252021-09-20 Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing Moustafa, Ahmed M Planet, Paul J Genome Biol Evol Research Article Discrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events. We developed a tool (GNU-based Virus IDentification [GNUVID]) that integrates whole-genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to all high-quality genomes available from GISAID. STs were clustered into clonal complexes (CCs) and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events and to estimate effective viral diversity across locations and over time in 16 US states. GNUVID is a highly scalable tool for viral genotype classification (https://github.com/ahmedmagds/GNUVID) that can quickly classify hundreds of thousands of genomes in a way that is consistent with phylogeny. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states, an average of 20.6 putative introductions and 7.5 exportations for each state over the time period analyzed. We introduce the use of effective diversity metrics (Hill numbers) that can be used to estimate the impact of interventions (e.g., travel restrictions, vaccine uptake, mask mandates) on the variation in circulating viruses. Our classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. GNUVID classification lends itself to measures of ecological diversity, and, with systematic genomic sampling, it could be used to track circulating viral diversity and identify emerging clones and hotspots. Oxford University Press 2021-08-25 /pmc/articles/PMC8449825/ /pubmed/34432021 http://dx.doi.org/10.1093/gbe/evab197 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Moustafa, Ahmed M
Planet, Paul J
Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title_full Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title_fullStr Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title_full_unstemmed Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title_short Emerging SARS-CoV-2 Diversity Revealed by Rapid Whole-Genome Sequence Typing
title_sort emerging sars-cov-2 diversity revealed by rapid whole-genome sequence typing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449825/
https://www.ncbi.nlm.nih.gov/pubmed/34432021
http://dx.doi.org/10.1093/gbe/evab197
work_keys_str_mv AT moustafaahmedm emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping
AT planetpaulj emergingsarscov2diversityrevealedbyrapidwholegenomesequencetyping