Cargando…

Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences

BACKGROUND AND OBJECTIVE: The world is currently facing a global emergency due to COVID-19, which requires immediate strategies to strengthen healthcare facilities and prevent further deaths. To achieve effective remedies and solutions, research on different aspects, including the genomic and proteo...

Descripción completa

Detalles Bibliográficos
Autores principales: Rout, Ranjeet Kumar, Hassan, Sk Sarif, Sheikh, Sabha, Umer, Saiyed, Sahoo, Kshira Sagar, Gandomi, Amir H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577876/
https://www.ncbi.nlm.nih.gov/pubmed/34815067
http://dx.doi.org/10.1016/j.compbiomed.2021.105024
_version_ 1784596152371380224
author Rout, Ranjeet Kumar
Hassan, Sk Sarif
Sheikh, Sabha
Umer, Saiyed
Sahoo, Kshira Sagar
Gandomi, Amir H.
author_facet Rout, Ranjeet Kumar
Hassan, Sk Sarif
Sheikh, Sabha
Umer, Saiyed
Sahoo, Kshira Sagar
Gandomi, Amir H.
author_sort Rout, Ranjeet Kumar
collection PubMed
description BACKGROUND AND OBJECTIVE: The world is currently facing a global emergency due to COVID-19, which requires immediate strategies to strengthen healthcare facilities and prevent further deaths. To achieve effective remedies and solutions, research on different aspects, including the genomic and proteomic level characterizations of SARS-CoV-2, are critical. In this work, the spatial representation/composition and distribution frequency of 20 amino acids across the primary protein sequences of SARS-CoV-2 were examined according to different parameters. METHOD: To identify the spatial distribution of amino acids over the primary protein sequences of SARS-CoV-2, the Hurst exponent and Shannon entropy were applied as parameters to fetch the autocorrelation and amount of information over the spatial representations. The frequency distribution of each amino acid over the protein sequences was also evaluated. In the case of a one-dimensional sequence, the Hurst exponent (HE) was utilized due to its linear relationship with the fractal dimension (D), i.e. [Formula: see text] , to characterize fractality. Moreover, binary Shannon entropy was considered to measure the uncertainty in a binary sequence then further applied to calculate amino acid conservation in the primary protein sequences. RESULTS AND CONCLUSION: Fourteen (14) SARS-CoV protein sequences were evaluated and compared with 105 SARS-CoV-2 proteins. The simulation results demonstrate the differences in the collected information about the amino acid spatial distribution in the SARS-CoV-2 and SARS-CoV proteins, enabling researchers to distinguish between the two types of CoV. The spatial arrangement of amino acids also reveals similarities and dissimilarities among the important structural proteins, E, M, N and S, which is pivotal to establish an evolutionary tree with other CoV strains.
format Online
Article
Text
id pubmed-8577876
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-85778762021-11-10 Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences Rout, Ranjeet Kumar Hassan, Sk Sarif Sheikh, Sabha Umer, Saiyed Sahoo, Kshira Sagar Gandomi, Amir H. Comput Biol Med Article BACKGROUND AND OBJECTIVE: The world is currently facing a global emergency due to COVID-19, which requires immediate strategies to strengthen healthcare facilities and prevent further deaths. To achieve effective remedies and solutions, research on different aspects, including the genomic and proteomic level characterizations of SARS-CoV-2, are critical. In this work, the spatial representation/composition and distribution frequency of 20 amino acids across the primary protein sequences of SARS-CoV-2 were examined according to different parameters. METHOD: To identify the spatial distribution of amino acids over the primary protein sequences of SARS-CoV-2, the Hurst exponent and Shannon entropy were applied as parameters to fetch the autocorrelation and amount of information over the spatial representations. The frequency distribution of each amino acid over the protein sequences was also evaluated. In the case of a one-dimensional sequence, the Hurst exponent (HE) was utilized due to its linear relationship with the fractal dimension (D), i.e. [Formula: see text] , to characterize fractality. Moreover, binary Shannon entropy was considered to measure the uncertainty in a binary sequence then further applied to calculate amino acid conservation in the primary protein sequences. RESULTS AND CONCLUSION: Fourteen (14) SARS-CoV protein sequences were evaluated and compared with 105 SARS-CoV-2 proteins. The simulation results demonstrate the differences in the collected information about the amino acid spatial distribution in the SARS-CoV-2 and SARS-CoV proteins, enabling researchers to distinguish between the two types of CoV. The spatial arrangement of amino acids also reveals similarities and dissimilarities among the important structural proteins, E, M, N and S, which is pivotal to establish an evolutionary tree with other CoV strains. Elsevier Ltd. 2022-02 2021-11-10 /pmc/articles/PMC8577876/ /pubmed/34815067 http://dx.doi.org/10.1016/j.compbiomed.2021.105024 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Rout, Ranjeet Kumar
Hassan, Sk Sarif
Sheikh, Sabha
Umer, Saiyed
Sahoo, Kshira Sagar
Gandomi, Amir H.
Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title_full Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title_fullStr Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title_full_unstemmed Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title_short Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences
title_sort feature-extraction and analysis based on spatial distribution of amino acids for sars-cov-2 protein sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577876/
https://www.ncbi.nlm.nih.gov/pubmed/34815067
http://dx.doi.org/10.1016/j.compbiomed.2021.105024
work_keys_str_mv AT routranjeetkumar featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences
AT hassansksarif featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences
AT sheikhsabha featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences
AT umersaiyed featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences
AT sahookshirasagar featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences
AT gandomiamirh featureextractionandanalysisbasedonspatialdistributionofaminoacidsforsarscov2proteinsequences