Cargando…

MetaTransformer: deep metagenomic sequencing read classification using self-attention models

Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yieldin...

Descripción completa

Detalles Bibliográficos
Autores principales: Wichmann, Alexander, Buschong, Etienne, Müller, André, Jünger, Daniel, Hildebrandt, Andreas, Hankeln, Thomas, Schmidt, Bertil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495543/
https://www.ncbi.nlm.nih.gov/pubmed/37705831
http://dx.doi.org/10.1093/nargab/lqad082
_version_ 1785104919416537088
author Wichmann, Alexander
Buschong, Etienne
Müller, André
Jünger, Daniel
Hildebrandt, Andreas
Hankeln, Thomas
Schmidt, Bertil
author_facet Wichmann, Alexander
Buschong, Etienne
Müller, André
Jünger, Daniel
Hildebrandt, Andreas
Hankeln, Thomas
Schmidt, Bertil
author_sort Wichmann, Alexander
collection PubMed
description Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yielding convincing results in fields such as motif identification and variant calling. DeepMicrobes, a machine learning-based classifier, has recently been introduced for taxonomic prediction at species and genus level. However, it relies on complex models based on bidirectional long short-term memory cells resulting in slow runtimes and excessive memory requirements, hampering its effective usability. We present MetaTransformer, a self-attention-based deep learning metagenomic analysis tool. Our transformer-encoder-based models enable efficient parallelization while outperforming DeepMicrobes in terms of species and genus classification abilities. Furthermore, we investigate approaches to reduce memory consumption and boost performance using different embedding schemes. As a result, we are able to achieve 2× to 5× speedup for inference compared to DeepMicrobes while keeping a significantly smaller memory footprint. MetaTransformer can be trained in 9 hours for genus and 16 hours for species prediction. Our results demonstrate performance improvements due to self-attention models and the impact of embedding schemes in deep learning on metagenomic sequencing data.
format Online
Article
Text
id pubmed-10495543
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104955432023-09-13 MetaTransformer: deep metagenomic sequencing read classification using self-attention models Wichmann, Alexander Buschong, Etienne Müller, André Jünger, Daniel Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil NAR Genom Bioinform Methods Article Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yielding convincing results in fields such as motif identification and variant calling. DeepMicrobes, a machine learning-based classifier, has recently been introduced for taxonomic prediction at species and genus level. However, it relies on complex models based on bidirectional long short-term memory cells resulting in slow runtimes and excessive memory requirements, hampering its effective usability. We present MetaTransformer, a self-attention-based deep learning metagenomic analysis tool. Our transformer-encoder-based models enable efficient parallelization while outperforming DeepMicrobes in terms of species and genus classification abilities. Furthermore, we investigate approaches to reduce memory consumption and boost performance using different embedding schemes. As a result, we are able to achieve 2× to 5× speedup for inference compared to DeepMicrobes while keeping a significantly smaller memory footprint. MetaTransformer can be trained in 9 hours for genus and 16 hours for species prediction. Our results demonstrate performance improvements due to self-attention models and the impact of embedding schemes in deep learning on metagenomic sequencing data. Oxford University Press 2023-09-11 /pmc/articles/PMC10495543/ /pubmed/37705831 http://dx.doi.org/10.1093/nargab/lqad082 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Wichmann, Alexander
Buschong, Etienne
Müller, André
Jünger, Daniel
Hildebrandt, Andreas
Hankeln, Thomas
Schmidt, Bertil
MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title_full MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title_fullStr MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title_full_unstemmed MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title_short MetaTransformer: deep metagenomic sequencing read classification using self-attention models
title_sort metatransformer: deep metagenomic sequencing read classification using self-attention models
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495543/
https://www.ncbi.nlm.nih.gov/pubmed/37705831
http://dx.doi.org/10.1093/nargab/lqad082
work_keys_str_mv AT wichmannalexander metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT buschongetienne metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT mullerandre metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT jungerdaniel metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT hildebrandtandreas metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT hankelnthomas metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels
AT schmidtbertil metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels