Cargando…

Variant analysis of SARS-CoV-2 genomes

OBJECTIVE: To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). METHODS: Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extractin...

Descripción completa

Detalles Bibliográficos
Autores principales: Koyama, Takahiko, Platt, Daniel, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: World Health Organization 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7375210/
https://www.ncbi.nlm.nih.gov/pubmed/32742035
http://dx.doi.org/10.2471/BLT.20.253591
_version_ 1783561835959549952
author Koyama, Takahiko
Platt, Daniel
Parida, Laxmi
author_facet Koyama, Takahiko
Platt, Daniel
Parida, Laxmi
author_sort Koyama, Takahiko
collection PubMed
description OBJECTIVE: To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). METHODS: Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For clade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5. FINDINGS: We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations, 484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C > T (6334 samples), P4715L in the open reading frame 1ab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L84S, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants. CONCLUSION: We found that several variants of the SARS-CoV-2 genome exist and that the D614G clade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population.
format Online
Article
Text
id pubmed-7375210
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher World Health Organization
record_format MEDLINE/PubMed
spelling pubmed-73752102020-07-31 Variant analysis of SARS-CoV-2 genomes Koyama, Takahiko Platt, Daniel Parida, Laxmi Bull World Health Organ Research OBJECTIVE: To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). METHODS: Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For clade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5. FINDINGS: We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations, 484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C > T (6334 samples), P4715L in the open reading frame 1ab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L84S, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants. CONCLUSION: We found that several variants of the SARS-CoV-2 genome exist and that the D614G clade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population. World Health Organization 2020-07-01 2020-06-02 /pmc/articles/PMC7375210/ /pubmed/32742035 http://dx.doi.org/10.2471/BLT.20.253591 Text en (c) 2020 The authors; licensee World Health Organization. This is an open access article distributed under the terms of the Creative Commons Attribution IGO License (http://creativecommons.org/licenses/by/3.0/igo/legalcode), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In any reproduction of this article there should not be any suggestion that WHO or this article endorse any specific organization or products. The use of the WHO logo is not permitted. This notice should be preserved along with the article's original URL.
spellingShingle Research
Koyama, Takahiko
Platt, Daniel
Parida, Laxmi
Variant analysis of SARS-CoV-2 genomes
title Variant analysis of SARS-CoV-2 genomes
title_full Variant analysis of SARS-CoV-2 genomes
title_fullStr Variant analysis of SARS-CoV-2 genomes
title_full_unstemmed Variant analysis of SARS-CoV-2 genomes
title_short Variant analysis of SARS-CoV-2 genomes
title_sort variant analysis of sars-cov-2 genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7375210/
https://www.ncbi.nlm.nih.gov/pubmed/32742035
http://dx.doi.org/10.2471/BLT.20.253591
work_keys_str_mv AT koyamatakahiko variantanalysisofsarscov2genomes
AT plattdaniel variantanalysisofsarscov2genomes
AT paridalaxmi variantanalysisofsarscov2genomes