Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization

Whole genome sequencing has rapidly progressed in recent years, with sequencing the SARS-CoV-2 genomes, making it a more reliable clinical tool for public health surveillance. This development has resulted in the production of a large amount of genomic data used for various types of genomic explorat...

Descripción completa

Detalles Bibliográficos
Autores principales: Sangeet, Satyam, Khan, Arshad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500253/
https://www.ncbi.nlm.nih.gov/pubmed/36157509
http://dx.doi.org/10.1177/11779322221126294
_version_ 1784795177470132224
author Sangeet, Satyam
Khan, Arshad
author_facet Sangeet, Satyam
Khan, Arshad
author_sort Sangeet, Satyam
collection PubMed
description Whole genome sequencing has rapidly progressed in recent years, with sequencing the SARS-CoV-2 genomes, making it a more reliable clinical tool for public health surveillance. This development has resulted in the production of a large amount of genomic data used for various types of genomic exploration. However, without a proper standard protocol, the usage of genomic data for analyzing various biological phenomena, such as mutation and evolution, may result in a propagating risk of using an unvalidated data set. This process could lead to irregular data being generated along with a high risk of altered analysis. Thus, the current study lays out the foundation for a preprocess pipeline using data analysis to analyze the genomic data set for its accuracy. We have used the recent example of SARS-CoV-2 to demonstrate the process overflow that can be utilized for various kinds of biological exploration such as understanding mutational events, evolutionary divergence, and speciation. Our analysis reveals a significant amount of sequence divergence in the gamma variant as compared with the reference genome thereby making the variant less infective and deadly. Moreover, we found regions in the genomic sequence that is more prone to mutational localization thereby altering the structural integrity of the virus resulting in a more reliable molecular viral mechanism. We believe that the current work will help for an initial check of the genomic data followed by the biological assessment of the process overflow which will be beneficial for the variant analysis and mutational uprising.
format Online
Article
Text
id pubmed-9500253
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-95002532022-09-24 Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization Sangeet, Satyam Khan, Arshad Bioinform Biol Insights Original Research Article Whole genome sequencing has rapidly progressed in recent years, with sequencing the SARS-CoV-2 genomes, making it a more reliable clinical tool for public health surveillance. This development has resulted in the production of a large amount of genomic data used for various types of genomic exploration. However, without a proper standard protocol, the usage of genomic data for analyzing various biological phenomena, such as mutation and evolution, may result in a propagating risk of using an unvalidated data set. This process could lead to irregular data being generated along with a high risk of altered analysis. Thus, the current study lays out the foundation for a preprocess pipeline using data analysis to analyze the genomic data set for its accuracy. We have used the recent example of SARS-CoV-2 to demonstrate the process overflow that can be utilized for various kinds of biological exploration such as understanding mutational events, evolutionary divergence, and speciation. Our analysis reveals a significant amount of sequence divergence in the gamma variant as compared with the reference genome thereby making the variant less infective and deadly. Moreover, we found regions in the genomic sequence that is more prone to mutational localization thereby altering the structural integrity of the virus resulting in a more reliable molecular viral mechanism. We believe that the current work will help for an initial check of the genomic data followed by the biological assessment of the process overflow which will be beneficial for the variant analysis and mutational uprising. SAGE Publications 2022-09-21 /pmc/articles/PMC9500253/ /pubmed/36157509 http://dx.doi.org/10.1177/11779322221126294 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by-nc/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Original Research Article
Sangeet, Satyam
Khan, Arshad
Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title_full Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title_fullStr Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title_full_unstemmed Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title_short Exploratory Data Analysis of Genomic Sequence of Variants of SARS-CoV-2 Reveals Sequence Divergence and Mutational Localization
title_sort exploratory data analysis of genomic sequence of variants of sars-cov-2 reveals sequence divergence and mutational localization
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9500253/
https://www.ncbi.nlm.nih.gov/pubmed/36157509
http://dx.doi.org/10.1177/11779322221126294
work_keys_str_mv AT sangeetsatyam exploratorydataanalysisofgenomicsequenceofvariantsofsarscov2revealssequencedivergenceandmutationallocalization
AT khanarshad exploratorydataanalysisofgenomicsequenceofvariantsofsarscov2revealssequencedivergenceandmutationallocalization