Cargando…

Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification

COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the Coronaviridade family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 Janua...

Descripción completa

Detalles Bibliográficos
Autores principales: Câmara, Gabriel B. M., Coutinho, Maria G. F., da Silva, Lucileide M. D., Gadelha, Walter V. do N., Torquato, Matheus F., Barbosa, Raquel de M., Fernandes, Marcelo A. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371030/
https://www.ncbi.nlm.nih.gov/pubmed/35957287
http://dx.doi.org/10.3390/s22155730
_version_ 1784767006788026368
author Câmara, Gabriel B. M.
Coutinho, Maria G. F.
da Silva, Lucileide M. D.
Gadelha, Walter V. do N.
Torquato, Matheus F.
Barbosa, Raquel de M.
Fernandes, Marcelo A. C.
author_facet Câmara, Gabriel B. M.
Coutinho, Maria G. F.
da Silva, Lucileide M. D.
Gadelha, Walter V. do N.
Torquato, Matheus F.
Barbosa, Raquel de M.
Fernandes, Marcelo A. C.
author_sort Câmara, Gabriel B. M.
collection PubMed
description COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the Coronaviridade family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 ± 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, Riboviria. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19.
format Online
Article
Text
id pubmed-9371030
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93710302022-08-12 Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification Câmara, Gabriel B. M. Coutinho, Maria G. F. da Silva, Lucileide M. D. Gadelha, Walter V. do N. Torquato, Matheus F. Barbosa, Raquel de M. Fernandes, Marcelo A. C. Sensors (Basel) Article COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the Coronaviridade family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 ± 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, Riboviria. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19. MDPI 2022-07-31 /pmc/articles/PMC9371030/ /pubmed/35957287 http://dx.doi.org/10.3390/s22155730 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Câmara, Gabriel B. M.
Coutinho, Maria G. F.
da Silva, Lucileide M. D.
Gadelha, Walter V. do N.
Torquato, Matheus F.
Barbosa, Raquel de M.
Fernandes, Marcelo A. C.
Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_full Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_fullStr Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_full_unstemmed Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_short Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_sort convolutional neural network applied to sars-cov-2 sequence classification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371030/
https://www.ncbi.nlm.nih.gov/pubmed/35957287
http://dx.doi.org/10.3390/s22155730
work_keys_str_mv AT camaragabrielbm convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT coutinhomariagf convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT dasilvalucileidemd convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT gadelhawaltervdon convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT torquatomatheusf convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT barbosaraqueldem convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT fernandesmarceloac convolutionalneuralnetworkappliedtosarscov2sequenceclassification