Cargando…
Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of prot...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9953644/ https://www.ncbi.nlm.nih.gov/pubmed/36831052 http://dx.doi.org/10.3390/biomedicines11020512 |
_version_ | 1784893929024389120 |
---|---|
author | Broni, Emmanuel Miller, Whelton A. |
author_facet | Broni, Emmanuel Miller, Whelton A. |
author_sort | Broni, Emmanuel |
collection | PubMed |
description | Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of −0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2. |
format | Online Article Text |
id | pubmed-9953644 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99536442023-02-25 Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes Broni, Emmanuel Miller, Whelton A. Biomedicines Article Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of −0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2. MDPI 2023-02-10 /pmc/articles/PMC9953644/ /pubmed/36831052 http://dx.doi.org/10.3390/biomedicines11020512 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Broni, Emmanuel Miller, Whelton A. Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title | Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title_full | Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title_fullStr | Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title_full_unstemmed | Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title_short | Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes |
title_sort | computational analysis predicts correlations among amino acids in sars-cov-2 proteomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9953644/ https://www.ncbi.nlm.nih.gov/pubmed/36831052 http://dx.doi.org/10.3390/biomedicines11020512 |
work_keys_str_mv | AT broniemmanuel computationalanalysispredictscorrelationsamongaminoacidsinsarscov2proteomes AT millerwheltona computationalanalysispredictscorrelationsamongaminoacidsinsarscov2proteomes |