Cargando…

Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences

BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular sys...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Toole, Áine, Pybus, Oliver G., Abram, Michael E., Kelly, Elizabeth J., Rambaut, Andrew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8832810/
https://www.ncbi.nlm.nih.gov/pubmed/35148677
http://dx.doi.org/10.1186/s12864-022-08358-2
_version_ 1784648797087858688
author O’Toole, Áine
Pybus, Oliver G.
Abram, Michael E.
Kelly, Elizabeth J.
Rambaut, Andrew
author_facet O’Toole, Áine
Pybus, Oliver G.
Abram, Michael E.
Kelly, Elizabeth J.
Rambaut, Andrew
author_sort O’Toole, Áine
collection PubMed
description BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. RESULTS: Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a “lineage set”, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. CONCLUSIONS: We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08358-2.
format Online
Article
Text
id pubmed-8832810
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88328102022-02-15 Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences O’Toole, Áine Pybus, Oliver G. Abram, Michael E. Kelly, Elizabeth J. Rambaut, Andrew BMC Genomics Research Article BACKGROUND: More than 2 million SARS-CoV-2 genome sequences have been generated and shared since the start of the COVID-19 pandemic and constitute a vital information source that informs outbreak control, disease surveillance, and public health policy. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. It is therefore important to understand how much information about Pango lineage status is contained in spike-only nucleotide sequences. Here we explore how Pango lineages might be reliably designated and assigned to spike-only nucleotide sequences. We survey the genetic diversity of such sequences, and investigate the information they contain about Pango lineage status. RESULTS: Although many lineages, including the main variants of concern, can be identified clearly using spike-only sequences, some spike-only sequences are shared among tens or hundreds of Pango lineages. To facilitate the classification of SARS-CoV-2 lineages using subgenomic sequences we introduce the notion of designating such sequences to a “lineage set”, which represents the range of Pango lineages that are consistent with the observed mutations in a given spike sequence. CONCLUSIONS: We find that many lineages, including the main variants-of-concern, can be reliably identified by spike alone and we define lineage-sets to represent the lineage precision that can be achieved using spike-only nucleotide sequences. These data provide a foundation for the development of software tools that can assign newly-generated spike nucleotide sequences to Pango lineage sets. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08358-2. BioMed Central 2022-02-11 /pmc/articles/PMC8832810/ /pubmed/35148677 http://dx.doi.org/10.1186/s12864-022-08358-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
O’Toole, Áine
Pybus, Oliver G.
Abram, Michael E.
Kelly, Elizabeth J.
Rambaut, Andrew
Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title_full Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title_fullStr Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title_full_unstemmed Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title_short Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences
title_sort pango lineage designation and assignment using sars-cov-2 spike gene nucleotide sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8832810/
https://www.ncbi.nlm.nih.gov/pubmed/35148677
http://dx.doi.org/10.1186/s12864-022-08358-2
work_keys_str_mv AT otooleaine pangolineagedesignationandassignmentusingsarscov2spikegenenucleotidesequences
AT pybusoliverg pangolineagedesignationandassignmentusingsarscov2spikegenenucleotidesequences
AT abrammichaele pangolineagedesignationandassignmentusingsarscov2spikegenenucleotidesequences
AT kellyelizabethj pangolineagedesignationandassignmentusingsarscov2spikegenenucleotidesequences
AT rambautandrew pangolineagedesignationandassignmentusingsarscov2spikegenenucleotidesequences