Cargando…

Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of prot...

Descripción completa

Detalles Bibliográficos
Autores principales: Broni, Emmanuel, Miller, Whelton A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9953644/
https://www.ncbi.nlm.nih.gov/pubmed/36831052
http://dx.doi.org/10.3390/biomedicines11020512
_version_ 1784893929024389120
author Broni, Emmanuel
Miller, Whelton A.
author_facet Broni, Emmanuel
Miller, Whelton A.
author_sort Broni, Emmanuel
collection PubMed
description Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of −0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2.
format Online
Article
Text
id pubmed-9953644
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99536442023-02-25 Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes Broni, Emmanuel Miller, Whelton A. Biomedicines Article Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of −0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2. MDPI 2023-02-10 /pmc/articles/PMC9953644/ /pubmed/36831052 http://dx.doi.org/10.3390/biomedicines11020512 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Broni, Emmanuel
Miller, Whelton A.
Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title_full Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title_fullStr Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title_full_unstemmed Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title_short Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
title_sort computational analysis predicts correlations among amino acids in sars-cov-2 proteomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9953644/
https://www.ncbi.nlm.nih.gov/pubmed/36831052
http://dx.doi.org/10.3390/biomedicines11020512
work_keys_str_mv AT broniemmanuel computationalanalysispredictscorrelationsamongaminoacidsinsarscov2proteomes
AT millerwheltona computationalanalysispredictscorrelationsamongaminoacidsinsarscov2proteomes