Cargando…

COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus

This paper proposes an efficient and accurate method to predict coronavirus disease 19 (COVID-19) based on the genome similarity of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and a bat SARS-CoV-like coronavirus. We introduce similarity features to distinguish COVID-19 from othe...

Descripción completa

Detalles Bibliográficos
Autor principal: Arslan, Hilal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423779/
https://www.ncbi.nlm.nih.gov/pubmed/34511707
http://dx.doi.org/10.1016/j.cie.2021.107666
_version_ 1783749539294871552
author Arslan, Hilal
author_facet Arslan, Hilal
author_sort Arslan, Hilal
collection PubMed
description This paper proposes an efficient and accurate method to predict coronavirus disease 19 (COVID-19) based on the genome similarity of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and a bat SARS-CoV-like coronavirus. We introduce similarity features to distinguish COVID-19 from other human coronaviruses by comparing human coronaviruses with a bat SARS-CoV-like coronavirus. In the proposed method each human coronavirus sequence is assigned to three similarity scores considering nucleotide similarities and mutations that lead to the strong absence of cytosine and guanine nucleotides. Next the proposed features are integrated with CpG island features of the genome sequences to improve COVID-19 prediction. Thus, each genome sequence is represented by five real numbers. We exhibit the effectiveness of the proposed features using six machine learning classifiers on a dataset including the genome sequences of human coronaviruses similar to SARS-CoV-2. The performances of the machine learning classifiers are close to each other and k-nearest neighbor classifier with similarity features achieves the best results with an accuracy of 99.2%. Moreover, k-nearest neighbor classifier with the integration of CpG based and similarity features has an admirable performance and achieves an accuracy of 99.8%. Experimental results demonstrate that similarity features remarkably decrease the number of false negatives and significantly improve the overall performance. The superiority of the proposed method is also highlighted by comparing with the state-of-the-art studies detecting COVID-19 from genome sequences.
format Online
Article
Text
id pubmed-8423779
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-84237792021-09-08 COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus Arslan, Hilal Comput Ind Eng Article This paper proposes an efficient and accurate method to predict coronavirus disease 19 (COVID-19) based on the genome similarity of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and a bat SARS-CoV-like coronavirus. We introduce similarity features to distinguish COVID-19 from other human coronaviruses by comparing human coronaviruses with a bat SARS-CoV-like coronavirus. In the proposed method each human coronavirus sequence is assigned to three similarity scores considering nucleotide similarities and mutations that lead to the strong absence of cytosine and guanine nucleotides. Next the proposed features are integrated with CpG island features of the genome sequences to improve COVID-19 prediction. Thus, each genome sequence is represented by five real numbers. We exhibit the effectiveness of the proposed features using six machine learning classifiers on a dataset including the genome sequences of human coronaviruses similar to SARS-CoV-2. The performances of the machine learning classifiers are close to each other and k-nearest neighbor classifier with similarity features achieves the best results with an accuracy of 99.2%. Moreover, k-nearest neighbor classifier with the integration of CpG based and similarity features has an admirable performance and achieves an accuracy of 99.8%. Experimental results demonstrate that similarity features remarkably decrease the number of false negatives and significantly improve the overall performance. The superiority of the proposed method is also highlighted by comparing with the state-of-the-art studies detecting COVID-19 from genome sequences. Elsevier Ltd. 2021-11 2021-09-08 /pmc/articles/PMC8423779/ /pubmed/34511707 http://dx.doi.org/10.1016/j.cie.2021.107666 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Arslan, Hilal
COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title_full COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title_fullStr COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title_full_unstemmed COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title_short COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus
title_sort covid-19 prediction based on genome similarity of human sars-cov-2 and bat sars-cov-like coronavirus
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423779/
https://www.ncbi.nlm.nih.gov/pubmed/34511707
http://dx.doi.org/10.1016/j.cie.2021.107666
work_keys_str_mv AT arslanhilal covid19predictionbasedongenomesimilarityofhumansarscov2andbatsarscovlikecoronavirus