Cargando…

Genomic style: yet another deep-learning approach to characterize bacterial genome sequences

MOTIVATION: Biological sequence classification is the most fundamental task in bioinformatics analysis. For example, in metagenome analysis, binning is a typical type of DNA sequence classification. In order to classify sequences, it is necessary to define sequence features. The k-mer frequency, bas...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoshimura, Yuka, Hamada, Akifumi, Augey, Yohann, Akiyama, Manato, Sakakibara, Yasubumi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710696/
https://www.ncbi.nlm.nih.gov/pubmed/36700086
http://dx.doi.org/10.1093/bioadv/vbab039
_version_ 1784841420854525952
author Yoshimura, Yuka
Hamada, Akifumi
Augey, Yohann
Akiyama, Manato
Sakakibara, Yasubumi
author_facet Yoshimura, Yuka
Hamada, Akifumi
Augey, Yohann
Akiyama, Manato
Sakakibara, Yasubumi
author_sort Yoshimura, Yuka
collection PubMed
description MOTIVATION: Biological sequence classification is the most fundamental task in bioinformatics analysis. For example, in metagenome analysis, binning is a typical type of DNA sequence classification. In order to classify sequences, it is necessary to define sequence features. The k-mer frequency, base composition and alignment-based metrics are commonly used. On the other hand, in the field of image recognition using machine learning, image classification is broadly divided into those based on shape and those based on style. A style matrix was introduced as a method of expressing the style of an image (e.g. color usage and texture). RESULTS: We propose a novel sequence feature, called genomic style, inspired by image classification approaches, for classifying and clustering DNA sequences. As with the style of images, the DNA sequence is considered to have a genomic style unique to the bacterial species, and the style matrix concept is applied to the DNA sequence. Our main aim is to introduce the genomics style as yet another basic sequence feature for metagenome binning problem in replace of the most commonly used sequence feature k-mer frequency. Performance evaluations showed that our method using a style matrix has the potential for accurate binning when compared with state-of-the-art binning tools based on k-mer frequency. AVAILABILITY AND IMPLEMENTATION: The source code for the implementation of this genomic style method, along with the dataset for the performance evaluation, is available from https://github.com/friendflower94/binning-style. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710696
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97106962023-01-24 Genomic style: yet another deep-learning approach to characterize bacterial genome sequences Yoshimura, Yuka Hamada, Akifumi Augey, Yohann Akiyama, Manato Sakakibara, Yasubumi Bioinform Adv Original Article MOTIVATION: Biological sequence classification is the most fundamental task in bioinformatics analysis. For example, in metagenome analysis, binning is a typical type of DNA sequence classification. In order to classify sequences, it is necessary to define sequence features. The k-mer frequency, base composition and alignment-based metrics are commonly used. On the other hand, in the field of image recognition using machine learning, image classification is broadly divided into those based on shape and those based on style. A style matrix was introduced as a method of expressing the style of an image (e.g. color usage and texture). RESULTS: We propose a novel sequence feature, called genomic style, inspired by image classification approaches, for classifying and clustering DNA sequences. As with the style of images, the DNA sequence is considered to have a genomic style unique to the bacterial species, and the style matrix concept is applied to the DNA sequence. Our main aim is to introduce the genomics style as yet another basic sequence feature for metagenome binning problem in replace of the most commonly used sequence feature k-mer frequency. Performance evaluations showed that our method using a style matrix has the potential for accurate binning when compared with state-of-the-art binning tools based on k-mer frequency. AVAILABILITY AND IMPLEMENTATION: The source code for the implementation of this genomic style method, along with the dataset for the performance evaluation, is available from https://github.com/friendflower94/binning-style. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2021-12-01 /pmc/articles/PMC9710696/ /pubmed/36700086 http://dx.doi.org/10.1093/bioadv/vbab039 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Yoshimura, Yuka
Hamada, Akifumi
Augey, Yohann
Akiyama, Manato
Sakakibara, Yasubumi
Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title_full Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title_fullStr Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title_full_unstemmed Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title_short Genomic style: yet another deep-learning approach to characterize bacterial genome sequences
title_sort genomic style: yet another deep-learning approach to characterize bacterial genome sequences
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710696/
https://www.ncbi.nlm.nih.gov/pubmed/36700086
http://dx.doi.org/10.1093/bioadv/vbab039
work_keys_str_mv AT yoshimurayuka genomicstyleyetanotherdeeplearningapproachtocharacterizebacterialgenomesequences
AT hamadaakifumi genomicstyleyetanotherdeeplearningapproachtocharacterizebacterialgenomesequences
AT augeyyohann genomicstyleyetanotherdeeplearningapproachtocharacterizebacterialgenomesequences
AT akiyamamanato genomicstyleyetanotherdeeplearningapproachtocharacterizebacterialgenomesequences
AT sakakibarayasubumi genomicstyleyetanotherdeeplearningapproachtocharacterizebacterialgenomesequences