Cargando…

WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs

The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works...

Descripción completa

Detalles Bibliográficos
Autores principales: Akbari Rokn Abadi, Saeedeh, Mohammadi, Amirhossein, Koohi, Somayyeh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9012348/
https://www.ncbi.nlm.nih.gov/pubmed/35427371
http://dx.doi.org/10.1371/journal.pone.0267106
_version_ 1784687774984568832
author Akbari Rokn Abadi, Saeedeh
Mohammadi, Amirhossein
Koohi, Somayyeh
author_facet Akbari Rokn Abadi, Saeedeh
Mohammadi, Amirhossein
Koohi, Somayyeh
author_sort Akbari Rokn Abadi, Saeedeh
collection PubMed
description The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works focus on more effective factors, such as input encoding method or implementation technology, to address accuracy and efficiency issues in this area. Therefore, in this work, we propose an image-based encoding method, called as WalkIm, whose adoption, even in a simple neural network, provides competitive accuracy and superior efficiency, compared to the existing classification methods (e.g. VGDC, CASTOR, and DLM-CNN) for a variety of biological sequences. Using WalkIm for classifying various data sets (i.e. viruses whole-genome data, metagenomics read data, and metabarcoding data), it achieves the same performance as the existing methods, with no enforcement of parameter initialization or network architecture adjustment for each data set. It is worth noting that even in the case of classifying high-mutant data sets, such as Coronaviruses, it achieves almost 100% accuracy for classifying its various types. In addition, WalkIm achieves high-speed convergence during network training, as well as reduction of network complexity. Therefore WalkIm method enables us to execute the classifying neural networks on a normal desktop system in a short time interval. Moreover, we addressed the compatibility of WalkIm encoding method with free-space optical processing technology. Taking advantages of optical implementation of convolutional layers, we illustrated that the training time can be reduced by up to 500 time. In addition to all aforementioned advantages, this encoding method preserves the structure of generated images in various modes of sequence transformation, such as reverse complement, complement, and reverse modes.
format Online
Article
Text
id pubmed-9012348
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90123482022-04-16 WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs Akbari Rokn Abadi, Saeedeh Mohammadi, Amirhossein Koohi, Somayyeh PLoS One Research Article The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works focus on more effective factors, such as input encoding method or implementation technology, to address accuracy and efficiency issues in this area. Therefore, in this work, we propose an image-based encoding method, called as WalkIm, whose adoption, even in a simple neural network, provides competitive accuracy and superior efficiency, compared to the existing classification methods (e.g. VGDC, CASTOR, and DLM-CNN) for a variety of biological sequences. Using WalkIm for classifying various data sets (i.e. viruses whole-genome data, metagenomics read data, and metabarcoding data), it achieves the same performance as the existing methods, with no enforcement of parameter initialization or network architecture adjustment for each data set. It is worth noting that even in the case of classifying high-mutant data sets, such as Coronaviruses, it achieves almost 100% accuracy for classifying its various types. In addition, WalkIm achieves high-speed convergence during network training, as well as reduction of network complexity. Therefore WalkIm method enables us to execute the classifying neural networks on a normal desktop system in a short time interval. Moreover, we addressed the compatibility of WalkIm encoding method with free-space optical processing technology. Taking advantages of optical implementation of convolutional layers, we illustrated that the training time can be reduced by up to 500 time. In addition to all aforementioned advantages, this encoding method preserves the structure of generated images in various modes of sequence transformation, such as reverse complement, complement, and reverse modes. Public Library of Science 2022-04-15 /pmc/articles/PMC9012348/ /pubmed/35427371 http://dx.doi.org/10.1371/journal.pone.0267106 Text en © 2022 Akbari Rokn Abadi et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Akbari Rokn Abadi, Saeedeh
Mohammadi, Amirhossein
Koohi, Somayyeh
WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title_full WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title_fullStr WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title_full_unstemmed WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title_short WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs
title_sort walkim: compact image-based encoding for high-performance classification of biological sequences using simple tuning-free cnns
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9012348/
https://www.ncbi.nlm.nih.gov/pubmed/35427371
http://dx.doi.org/10.1371/journal.pone.0267106
work_keys_str_mv AT akbariroknabadisaeedeh walkimcompactimagebasedencodingforhighperformanceclassificationofbiologicalsequencesusingsimpletuningfreecnns
AT mohammadiamirhossein walkimcompactimagebasedencodingforhighperformanceclassificationofbiologicalsequencesusingsimpletuningfreecnns
AT koohisomayyeh walkimcompactimagebasedencodingforhighperformanceclassificationofbiologicalsequencesusingsimpletuningfreecnns