Cargando…

Classifying COVID-19 based on amino acids encoding with machine learning algorithms

COVID-19 disease causes serious respiratory illnesses. Therefore, accurate identification of the viral infection cycle plays a key role in designing appropriate vaccines. The risk of this disease depends on proteins that interact with human receptors. In this paper, we formulate a novel model for CO...

Descripción completa

Detalles Bibliográficos
Autores principales: Alkady, Walaa, ElBahnasy, Khaled, Leiva, Víctor, Gad, Walaa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8923015/
https://www.ncbi.nlm.nih.gov/pubmed/35308181
http://dx.doi.org/10.1016/j.chemolab.2022.104535
_version_ 1784669609012494336
author Alkady, Walaa
ElBahnasy, Khaled
Leiva, Víctor
Gad, Walaa
author_facet Alkady, Walaa
ElBahnasy, Khaled
Leiva, Víctor
Gad, Walaa
author_sort Alkady, Walaa
collection PubMed
description COVID-19 disease causes serious respiratory illnesses. Therefore, accurate identification of the viral infection cycle plays a key role in designing appropriate vaccines. The risk of this disease depends on proteins that interact with human receptors. In this paper, we formulate a novel model for COVID-19 named “amino acid encoding based prediction” (AAPred). This model is accurate, classifies the various coronavirus types, and distinguishes SARS-CoV-2 from other coronaviruses. With the AAPred model, we reduce the number of features to enhance its performance by selecting the most important ones employing statistical criteria. The protein sequence of SARS-CoV-2 for understanding the viral infection cycle is analyzed. Six machine learning classifiers related to decision trees, k-nearest neighbors, random forest, support vector machine, bagging ensemble, and gradient boosting are used to evaluate the model in terms of accuracy, precision, sensitivity, and specificity. We implement the obtained results computationally and apply them to real data from the National Genomics Data Center. The experimental results report that the AAPred model reduces the features to seven of them. The average accuracy of the 10-fold cross-validation is 98.69%, precision is 98.72%, sensitivity is 96.81%, and specificity is 97.72%. The features are selected utilizing information gain and classified with random forest. The proposed model predicts the type of Coronavirus and reduces the number of extracted features. We identify that SARS-CoV-2 has similar physicochemical characteristics in some regions of SARS-CoV. Also, we report that SARS-CoV-2 has similar infection cycles and sequences in some regions of SARS CoV indicating the affectedness of vaccines on SARS-CoV-2. A comparison with deep learning shows similar results with our method.
format Online
Article
Text
id pubmed-8923015
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-89230152022-03-15 Classifying COVID-19 based on amino acids encoding with machine learning algorithms Alkady, Walaa ElBahnasy, Khaled Leiva, Víctor Gad, Walaa Chemometr Intell Lab Syst Article COVID-19 disease causes serious respiratory illnesses. Therefore, accurate identification of the viral infection cycle plays a key role in designing appropriate vaccines. The risk of this disease depends on proteins that interact with human receptors. In this paper, we formulate a novel model for COVID-19 named “amino acid encoding based prediction” (AAPred). This model is accurate, classifies the various coronavirus types, and distinguishes SARS-CoV-2 from other coronaviruses. With the AAPred model, we reduce the number of features to enhance its performance by selecting the most important ones employing statistical criteria. The protein sequence of SARS-CoV-2 for understanding the viral infection cycle is analyzed. Six machine learning classifiers related to decision trees, k-nearest neighbors, random forest, support vector machine, bagging ensemble, and gradient boosting are used to evaluate the model in terms of accuracy, precision, sensitivity, and specificity. We implement the obtained results computationally and apply them to real data from the National Genomics Data Center. The experimental results report that the AAPred model reduces the features to seven of them. The average accuracy of the 10-fold cross-validation is 98.69%, precision is 98.72%, sensitivity is 96.81%, and specificity is 97.72%. The features are selected utilizing information gain and classified with random forest. The proposed model predicts the type of Coronavirus and reduces the number of extracted features. We identify that SARS-CoV-2 has similar physicochemical characteristics in some regions of SARS-CoV. Also, we report that SARS-CoV-2 has similar infection cycles and sequences in some regions of SARS CoV indicating the affectedness of vaccines on SARS-CoV-2. A comparison with deep learning shows similar results with our method. Elsevier B.V. 2022-05-15 2022-03-15 /pmc/articles/PMC8923015/ /pubmed/35308181 http://dx.doi.org/10.1016/j.chemolab.2022.104535 Text en © 2022 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Alkady, Walaa
ElBahnasy, Khaled
Leiva, Víctor
Gad, Walaa
Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title_full Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title_fullStr Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title_full_unstemmed Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title_short Classifying COVID-19 based on amino acids encoding with machine learning algorithms
title_sort classifying covid-19 based on amino acids encoding with machine learning algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8923015/
https://www.ncbi.nlm.nih.gov/pubmed/35308181
http://dx.doi.org/10.1016/j.chemolab.2022.104535
work_keys_str_mv AT alkadywalaa classifyingcovid19basedonaminoacidsencodingwithmachinelearningalgorithms
AT elbahnasykhaled classifyingcovid19basedonaminoacidsencodingwithmachinelearningalgorithms
AT leivavictor classifyingcovid19basedonaminoacidsencodingwithmachinelearningalgorithms
AT gadwalaa classifyingcovid19basedonaminoacidsencodingwithmachinelearningalgorithms