Cargando…
Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution
Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the r...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450220/ https://www.ncbi.nlm.nih.gov/pubmed/34547443 http://dx.doi.org/10.1016/j.ymeth.2021.09.005 |
_version_ | 1784569590015066112 |
---|---|
author | Ghosh, Nimisha Saha, Indrajit Nandi, Suman Sharma, Nikhil |
author_facet | Ghosh, Nimisha Saha, Indrajit Nandi, Suman Sharma, Nikhil |
author_sort | Ghosh, Nimisha |
collection | PubMed |
description | Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the research communities. Hence, it is important to study the constantly evolving virus and its strains in order to provide a much more stable form of cure. This fact motivated us to conduct this research where we have initially carried out multiple sequence alignment of 15359 and 3033 global dataset without Indian and the dataset of exclusive Indian SARS-CoV-2 genomes respectively, using MAFFT. Subsequently, phylogenetic analyses are performed using Nextstrain to identify virus clades. Consequently, the virus strains are found to be distributed among 5 major clades or clusters viz. 19A, 19B, 20A, 20B and 20C. Thereafter, mutation points as SNPs are identified in each clade. Henceforth, from each clade top 10 signature SNPs are identified based on their frequency i.e. number of occurrences in the virus genome. As a result, 50 such signature SNPs are individually identified for global dataset without Indian and dataset of exclusive Indian SARS-CoV-2 genomes respectively. Out of each 50 signature SNPs, 39 and 41 unique SNPs are identified among which 25 non-synonymous signature SNPs (out of 39) resulted in 30 amino acid changes in protein while 27 changes in amino acid are identified from 22 non-synonymous signature SNPs (out of 41). These 30 and 27 amino acid changes for the non-synonymous signature SNPs are visualised in their respective protein structure as well. Finally, in order to judge the characteristics of the identified clades, the non-synonymous signature SNPs are considered to evaluate the changes in proteins as biological functions with the sequences using PROVEAN and PolyPhen-2 while I-Mutant 2.0 is used to evaluate their structural stability. As a consequence, for global dataset without Indian sequences, G251V in ORF3a in clade 19A, F308Y and G196V in NSP4 and ORF3a in 19B are the unique amino acid changes which are responsible for defining each clade as they are all deleterious and unstable. Such changes which are common for both global dataset without Indian and dataset of exclusive Indian sequences are R203M in Nucleocapsid for 20B, T85I and Q57H in NSP2 and ORF3a respectively for 20C while for exclusive Indian sequences such unique changes are A97V in RdRp, G339S and G339C in NSP2 in 19A and Q57H in ORF3a in 20A. |
format | Online Article Text |
id | pubmed-8450220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84502202021-09-20 Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution Ghosh, Nimisha Saha, Indrajit Nandi, Suman Sharma, Nikhil Methods Article Since the emergence of SARS-CoV-2 in Wuhan, China more than a year ago, it has spread across the world in a very short span of time. Although, different forms of vaccines are being rolled out for vaccination programs around the globe, the mutation of the virus is still a cause of concern among the research communities. Hence, it is important to study the constantly evolving virus and its strains in order to provide a much more stable form of cure. This fact motivated us to conduct this research where we have initially carried out multiple sequence alignment of 15359 and 3033 global dataset without Indian and the dataset of exclusive Indian SARS-CoV-2 genomes respectively, using MAFFT. Subsequently, phylogenetic analyses are performed using Nextstrain to identify virus clades. Consequently, the virus strains are found to be distributed among 5 major clades or clusters viz. 19A, 19B, 20A, 20B and 20C. Thereafter, mutation points as SNPs are identified in each clade. Henceforth, from each clade top 10 signature SNPs are identified based on their frequency i.e. number of occurrences in the virus genome. As a result, 50 such signature SNPs are individually identified for global dataset without Indian and dataset of exclusive Indian SARS-CoV-2 genomes respectively. Out of each 50 signature SNPs, 39 and 41 unique SNPs are identified among which 25 non-synonymous signature SNPs (out of 39) resulted in 30 amino acid changes in protein while 27 changes in amino acid are identified from 22 non-synonymous signature SNPs (out of 41). These 30 and 27 amino acid changes for the non-synonymous signature SNPs are visualised in their respective protein structure as well. Finally, in order to judge the characteristics of the identified clades, the non-synonymous signature SNPs are considered to evaluate the changes in proteins as biological functions with the sequences using PROVEAN and PolyPhen-2 while I-Mutant 2.0 is used to evaluate their structural stability. As a consequence, for global dataset without Indian sequences, G251V in ORF3a in clade 19A, F308Y and G196V in NSP4 and ORF3a in 19B are the unique amino acid changes which are responsible for defining each clade as they are all deleterious and unstable. Such changes which are common for both global dataset without Indian and dataset of exclusive Indian sequences are R203M in Nucleocapsid for 20B, T85I and Q57H in NSP2 and ORF3a respectively for 20C while for exclusive Indian sequences such unique changes are A97V in RdRp, G339S and G339C in NSP2 in 19A and Q57H in ORF3a in 20A. Elsevier Inc. 2022-07 2021-09-20 /pmc/articles/PMC8450220/ /pubmed/34547443 http://dx.doi.org/10.1016/j.ymeth.2021.09.005 Text en © 2021 Elsevier Inc. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Ghosh, Nimisha Saha, Indrajit Nandi, Suman Sharma, Nikhil Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title | Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title_full | Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title_fullStr | Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title_full_unstemmed | Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title_short | Characterisation of SARS-CoV-2 clades based on signature SNPs unveils continuous evolution |
title_sort | characterisation of sars-cov-2 clades based on signature snps unveils continuous evolution |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450220/ https://www.ncbi.nlm.nih.gov/pubmed/34547443 http://dx.doi.org/10.1016/j.ymeth.2021.09.005 |
work_keys_str_mv | AT ghoshnimisha characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution AT sahaindrajit characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution AT nandisuman characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution AT sharmanikhil characterisationofsarscov2cladesbasedonsignaturesnpsunveilscontinuousevolution |