Cargando…

Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms

Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-Co...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Om Prakash, Vallejo, Marta, El-Badawy, Ismail M., Aysha, Ali, Madhanagopal, Jagannathan, Mohd Faudzi, Ahmad Athif
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294595/
https://www.ncbi.nlm.nih.gov/pubmed/34329865
http://dx.doi.org/10.1016/j.compbiomed.2021.104650
_version_ 1783725268346601472
author Singh, Om Prakash
Vallejo, Marta
El-Badawy, Ismail M.
Aysha, Ali
Madhanagopal, Jagannathan
Mohd Faudzi, Ahmad Athif
author_facet Singh, Om Prakash
Vallejo, Marta
El-Badawy, Ismail M.
Aysha, Ali
Madhanagopal, Jagannathan
Mohd Faudzi, Ahmad Athif
author_sort Singh, Om Prakash
collection PubMed
description Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies.
format Online
Article
Text
id pubmed-8294595
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-82945952021-07-21 Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms Singh, Om Prakash Vallejo, Marta El-Badawy, Ismail M. Aysha, Ali Madhanagopal, Jagannathan Mohd Faudzi, Ahmad Athif Comput Biol Med Article Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies. Elsevier Ltd. 2021-09 2021-07-21 /pmc/articles/PMC8294595/ /pubmed/34329865 http://dx.doi.org/10.1016/j.compbiomed.2021.104650 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Singh, Om Prakash
Vallejo, Marta
El-Badawy, Ismail M.
Aysha, Ali
Madhanagopal, Jagannathan
Mohd Faudzi, Ahmad Athif
Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title_full Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title_fullStr Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title_full_unstemmed Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title_short Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
title_sort classification of sars-cov-2 and non-sars-cov-2 using machine learning algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294595/
https://www.ncbi.nlm.nih.gov/pubmed/34329865
http://dx.doi.org/10.1016/j.compbiomed.2021.104650
work_keys_str_mv AT singhomprakash classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms
AT vallejomarta classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms
AT elbadawyismailm classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms
AT ayshaali classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms
AT madhanagopaljagannathan classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms
AT mohdfaudziahmadathif classificationofsarscov2andnonsarscov2usingmachinelearningalgorithms