Cargando…
MetaTransformer: deep metagenomic sequencing read classification using self-attention models
Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yieldin...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495543/ https://www.ncbi.nlm.nih.gov/pubmed/37705831 http://dx.doi.org/10.1093/nargab/lqad082 |
_version_ | 1785104919416537088 |
---|---|
author | Wichmann, Alexander Buschong, Etienne Müller, André Jünger, Daniel Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil |
author_facet | Wichmann, Alexander Buschong, Etienne Müller, André Jünger, Daniel Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil |
author_sort | Wichmann, Alexander |
collection | PubMed |
description | Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yielding convincing results in fields such as motif identification and variant calling. DeepMicrobes, a machine learning-based classifier, has recently been introduced for taxonomic prediction at species and genus level. However, it relies on complex models based on bidirectional long short-term memory cells resulting in slow runtimes and excessive memory requirements, hampering its effective usability. We present MetaTransformer, a self-attention-based deep learning metagenomic analysis tool. Our transformer-encoder-based models enable efficient parallelization while outperforming DeepMicrobes in terms of species and genus classification abilities. Furthermore, we investigate approaches to reduce memory consumption and boost performance using different embedding schemes. As a result, we are able to achieve 2× to 5× speedup for inference compared to DeepMicrobes while keeping a significantly smaller memory footprint. MetaTransformer can be trained in 9 hours for genus and 16 hours for species prediction. Our results demonstrate performance improvements due to self-attention models and the impact of embedding schemes in deep learning on metagenomic sequencing data. |
format | Online Article Text |
id | pubmed-10495543 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104955432023-09-13 MetaTransformer: deep metagenomic sequencing read classification using self-attention models Wichmann, Alexander Buschong, Etienne Müller, André Jünger, Daniel Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil NAR Genom Bioinform Methods Article Deep learning has emerged as a paradigm that revolutionizes numerous domains of scientific research. Transformers have been utilized in language modeling outperforming previous approaches. Therefore, the utilization of deep learning as a tool for analyzing the genomic sequences is promising, yielding convincing results in fields such as motif identification and variant calling. DeepMicrobes, a machine learning-based classifier, has recently been introduced for taxonomic prediction at species and genus level. However, it relies on complex models based on bidirectional long short-term memory cells resulting in slow runtimes and excessive memory requirements, hampering its effective usability. We present MetaTransformer, a self-attention-based deep learning metagenomic analysis tool. Our transformer-encoder-based models enable efficient parallelization while outperforming DeepMicrobes in terms of species and genus classification abilities. Furthermore, we investigate approaches to reduce memory consumption and boost performance using different embedding schemes. As a result, we are able to achieve 2× to 5× speedup for inference compared to DeepMicrobes while keeping a significantly smaller memory footprint. MetaTransformer can be trained in 9 hours for genus and 16 hours for species prediction. Our results demonstrate performance improvements due to self-attention models and the impact of embedding schemes in deep learning on metagenomic sequencing data. Oxford University Press 2023-09-11 /pmc/articles/PMC10495543/ /pubmed/37705831 http://dx.doi.org/10.1093/nargab/lqad082 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Wichmann, Alexander Buschong, Etienne Müller, André Jünger, Daniel Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title | MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title_full | MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title_fullStr | MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title_full_unstemmed | MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title_short | MetaTransformer: deep metagenomic sequencing read classification using self-attention models |
title_sort | metatransformer: deep metagenomic sequencing read classification using self-attention models |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495543/ https://www.ncbi.nlm.nih.gov/pubmed/37705831 http://dx.doi.org/10.1093/nargab/lqad082 |
work_keys_str_mv | AT wichmannalexander metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT buschongetienne metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT mullerandre metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT jungerdaniel metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT hildebrandtandreas metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT hankelnthomas metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels AT schmidtbertil metatransformerdeepmetagenomicsequencingreadclassificationusingselfattentionmodels |