Cargando…

SARS-CoV-2 virus classification based on stacked sparse autoencoder

Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and...

Descripción completa

Detalles Bibliográficos
Autores principales: Coutinho, Maria G.F., Câmara, Gabriel B.M., Barbosa, Raquel de M., Fernandes, Marcelo A.C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9742810/
https://www.ncbi.nlm.nih.gov/pubmed/36530948
http://dx.doi.org/10.1016/j.csbj.2022.12.007
_version_ 1784848598124462080
author Coutinho, Maria G.F.
Câmara, Gabriel B.M.
Barbosa, Raquel de M.
Fernandes, Marcelo A.C.
author_facet Coutinho, Maria G.F.
Câmara, Gabriel B.M.
Barbosa, Raquel de M.
Fernandes, Marcelo A.C.
author_sort Coutinho, Maria G.F.
collection PubMed
description Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infection diagnosis, metagenomics, phylogenetics, and analysis. Considering that motivation, the authors proposed an efficient viral genome classifier for the SARS-CoV-2 using the deep neural network based on the stacked sparse autoencoder (SSAE). For the best performance of the model, we explored the utilization of image representations of the complete genome sequences as the SSAE input to provide a classification of the SARS-CoV-2. For that, a dataset based on k-mers image representation was applied. We performed four experiments to provide different levels of taxonomic classification of the SARS-CoV-2. The SSAE technique provided great performance results in all experiments, achieving classification accuracy between 92% and 100% for the validation set and between 98.9% and 100% when the SARS-CoV-2 samples were applied for the test set. In this work, samples of the SARS-CoV-2 were not used during the training process, only during subsequent tests, in which the model was able to infer the correct classification of the samples in the vast majority of cases. This indicates that our model can be adapted to classify other emerging viruses. Finally, the results indicated the applicability of this deep learning technique in genome classification problems.
format Online
Article
Text
id pubmed-9742810
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-97428102022-12-12 SARS-CoV-2 virus classification based on stacked sparse autoencoder Coutinho, Maria G.F. Câmara, Gabriel B.M. Barbosa, Raquel de M. Fernandes, Marcelo A.C. Comput Struct Biotechnol J Research Article Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infection diagnosis, metagenomics, phylogenetics, and analysis. Considering that motivation, the authors proposed an efficient viral genome classifier for the SARS-CoV-2 using the deep neural network based on the stacked sparse autoencoder (SSAE). For the best performance of the model, we explored the utilization of image representations of the complete genome sequences as the SSAE input to provide a classification of the SARS-CoV-2. For that, a dataset based on k-mers image representation was applied. We performed four experiments to provide different levels of taxonomic classification of the SARS-CoV-2. The SSAE technique provided great performance results in all experiments, achieving classification accuracy between 92% and 100% for the validation set and between 98.9% and 100% when the SARS-CoV-2 samples were applied for the test set. In this work, samples of the SARS-CoV-2 were not used during the training process, only during subsequent tests, in which the model was able to infer the correct classification of the samples in the vast majority of cases. This indicates that our model can be adapted to classify other emerging viruses. Finally, the results indicated the applicability of this deep learning technique in genome classification problems. Research Network of Computational and Structural Biotechnology 2022-12-09 /pmc/articles/PMC9742810/ /pubmed/36530948 http://dx.doi.org/10.1016/j.csbj.2022.12.007 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Coutinho, Maria G.F.
Câmara, Gabriel B.M.
Barbosa, Raquel de M.
Fernandes, Marcelo A.C.
SARS-CoV-2 virus classification based on stacked sparse autoencoder
title SARS-CoV-2 virus classification based on stacked sparse autoencoder
title_full SARS-CoV-2 virus classification based on stacked sparse autoencoder
title_fullStr SARS-CoV-2 virus classification based on stacked sparse autoencoder
title_full_unstemmed SARS-CoV-2 virus classification based on stacked sparse autoencoder
title_short SARS-CoV-2 virus classification based on stacked sparse autoencoder
title_sort sars-cov-2 virus classification based on stacked sparse autoencoder
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9742810/
https://www.ncbi.nlm.nih.gov/pubmed/36530948
http://dx.doi.org/10.1016/j.csbj.2022.12.007
work_keys_str_mv AT coutinhomariagf sarscov2virusclassificationbasedonstackedsparseautoencoder
AT camaragabrielbm sarscov2virusclassificationbasedonstackedsparseautoencoder
AT barbosaraqueldem sarscov2virusclassificationbasedonstackedsparseautoencoder
AT fernandesmarceloac sarscov2virusclassificationbasedonstackedsparseautoencoder