Cargando…

Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution

Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the r...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghosh, Nimisha, Saha, Indrajit, Nandi, Suman, Sharma, Nikhil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450220/
https://www.ncbi.nlm.nih.gov/pubmed/34547443
http://dx.doi.org/10.1016/j.ymeth.2021.09.005
_version_ 1784569590015066112
author Ghosh, Nimisha
Saha, Indrajit
Nandi, Suman
Sharma, Nikhil
author_facet Ghosh, Nimisha
Saha, Indrajit
Nandi, Suman
Sharma, Nikhil
author_sort Ghosh, Nimisha
collection PubMed
description Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the research communities. Hence, it is important to study the constantly evolving virus and its strains in order to provide a much more stable form of cure. This fact motivated us to conduct this research where we have initially carried out multiple sequence alignment of 15359 and 3033 global dataset without Indian and the dataset of exclusive Indian SARS-CoV-2 genomes respectively, using MAFFT. Subsequently, phylogenetic analyses are performed using Nextstrain to identify virus clades. Consequently, the virus strains are found to be distributed among 5 major clades or clusters viz. 19A, 19B, 20A, 20B and 20C. Thereafter, mutation points as SNPs are identified in each clade. Henceforth, from each clade top 10 signature SNPs are identified based on their frequency i.e. number of occurrences in the virus genome. As a result, 50 such signature SNPs are individually identified for global dataset without Indian and dataset of exclusive Indian SARS-CoV-2 genomes respectively. Out of each 50 signature SNPs, 39 and 41 unique SNPs are identified among which 25 non-synonymous signature SNPs (out of 39) resulted in 30 amino acid changes in protein while 27 changes in amino acid are identified from 22 non-synonymous signature SNPs (out of 41). These 30 and 27 amino acid changes for the non-synonymous signature SNPs are visualised in their respective protein structure as well. Finally, in order to judge the characteristics of the identified clades, the non-synonymous signature SNPs are considered to evaluate the changes in proteins as biological functions with the sequences using PROVEAN and PolyPhen-2 while I-Mutant 2.0 is used to evaluate their structural stability. As a consequence, for global dataset without Indian sequences, G251V in ORF3a in clade 19A, F308Y and G196V in NSP4 and ORF3a in 19B are the unique amino acid changes which are responsible for defining each clade as they are all deleterious and unstable. Such changes which are common for both global dataset without Indian and dataset of exclusive Indian sequences are R203M in Nucleocapsid for 20B, T85I and Q57H in NSP2 and ORF3a respectively for 20C while for exclusive Indian sequences such unique changes are A97V in RdRp, G339S and G339C in NSP2 in 19A and Q57H in ORF3a in 20A.
format Online
Article
Text
id pubmed-8450220
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Inc.
record_format MEDLINE/PubMed
spelling pubmed-84502202021-09-20 Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution Ghosh, Nimisha Saha, Indrajit Nandi, Suman Sharma, Nikhil Methods Article Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the research communities. Hence, it is important to study the constantly evolving virus and its strains in order to provide a much more stable form of cure. This fact motivated us to conduct this research where we have initially carried out multiple sequence alignment of 15359 and 3033 global dataset without Indian and the dataset of exclusive Indian SARS-CoV-2 genomes respectively, using MAFFT. Subsequently, phylogenetic analyses are performed using Nextstrain to identify virus clades. Consequently, the virus strains are found to be distributed among 5 major clades or clusters viz. 19A, 19B, 20A, 20B and 20C. Thereafter, mutation points as SNPs are identified in each clade. Henceforth, from each clade top 10 signature SNPs are identified based on their frequency i.e. number of occurrences in the virus genome. As a result, 50 such signature SNPs are individually identified for global dataset without Indian and dataset of exclusive Indian SARS-CoV-2 genomes respectively. Out of each 50 signature SNPs, 39 and 41 unique SNPs are identified among which 25 non-synonymous signature SNPs (out of 39) resulted in 30 amino acid changes in protein while 27 changes in amino acid are identified from 22 non-synonymous signature SNPs (out of 41). These 30 and 27 amino acid changes for the non-synonymous signature SNPs are visualised in their respective protein structure as well. Finally, in order to judge the characteristics of the identified clades, the non-synonymous signature SNPs are considered to evaluate the changes in proteins as biological functions with the sequences using PROVEAN and PolyPhen-2 while I-Mutant 2.0 is used to evaluate their structural stability. As a consequence, for global dataset without Indian sequences, G251V in ORF3a in clade 19A, F308Y and G196V in NSP4 and ORF3a in 19B are the unique amino acid changes which are responsible for defining each clade as they are all deleterious and unstable. Such changes which are common for both global dataset without Indian and dataset of exclusive Indian sequences are R203M in Nucleocapsid for 20B, T85I and Q57H in NSP2 and ORF3a respectively for 20C while for exclusive Indian sequences such unique changes are A97V in RdRp, G339S and G339C in NSP2 in 19A and Q57H in ORF3a in 20A. Elsevier Inc. 2022-07 2021-09-20 /pmc/articles/PMC8450220/ /pubmed/34547443 http://dx.doi.org/10.1016/j.ymeth.2021.09.005 Text en © 2021 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Ghosh, Nimisha
Saha, Indrajit
Nandi, Suman
Sharma, Nikhil
Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title_full Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title_fullStr Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title_full_unstemmed Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title_short Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
title_sort characterisation of sars-cov-2 clades based on signature snps unveils continuous evolution
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450220/
https://www.ncbi.nlm.nih.gov/pubmed/34547443
http://dx.doi.org/10.1016/j.ymeth.2021.09.005
work_keys_str_mv AT ghoshnimisha characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution
AT sahaindrajit characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution
AT nandisuman characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution
AT sharmanikhil characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution