Cargando…

DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes

A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, archaea, and eukaryotes. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a type of major pathogens, detecting viruses from metagenomes pl...

Descripción completa

Detalles Bibliográficos
Autores principales: Miao, Yan, Bian, Jilong, Dong, Guanghui, Dai, Tianhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10313334/
https://www.ncbi.nlm.nih.gov/pubmed/37396369
http://dx.doi.org/10.3389/fmicb.2023.1169791
_version_ 1785067103766708224
author Miao, Yan
Bian, Jilong
Dong, Guanghui
Dai, Tianhong
author_facet Miao, Yan
Bian, Jilong
Dong, Guanghui
Dai, Tianhong
author_sort Miao, Yan
collection PubMed
description A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, archaea, and eukaryotes. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a type of major pathogens, detecting viruses from metagenomes plays a crucial role in analyzing the viral component of samples and is the very first step for clinical diagnosis. However, detecting viral fragments directly from the metagenomes is still a tough issue because of the existence of a huge number of short sequences. In this study a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes (DETIRE) is proposed to solve the problem. First, the graph-based nucleotide sequence embedding strategy is utilized to enrich the expression of DNA sequences by training an embedding matrix. Then, the spatial and sequential features are extracted by trained CNN and BiLSTM networks, respectively, to enrich the features of short sequences. Finally, the two sets of features are weighted combined for the final decision. Trained by 220,000 sequences of 500 bp subsampled from the Virus and Host RefSeq genomes, DETIRE identifies more short viral sequences (<1,000 bp) than the three latest methods, such as DeepVirFinder, PPR-Meta, and CHEER. DETIRE is freely available at Github (https://github.com/crazyinter/DETIRE).
format Online
Article
Text
id pubmed-10313334
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103133342023-07-01 DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes Miao, Yan Bian, Jilong Dong, Guanghui Dai, Tianhong Front Microbiol Microbiology A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, archaea, and eukaryotes. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a type of major pathogens, detecting viruses from metagenomes plays a crucial role in analyzing the viral component of samples and is the very first step for clinical diagnosis. However, detecting viral fragments directly from the metagenomes is still a tough issue because of the existence of a huge number of short sequences. In this study a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes (DETIRE) is proposed to solve the problem. First, the graph-based nucleotide sequence embedding strategy is utilized to enrich the expression of DNA sequences by training an embedding matrix. Then, the spatial and sequential features are extracted by trained CNN and BiLSTM networks, respectively, to enrich the features of short sequences. Finally, the two sets of features are weighted combined for the final decision. Trained by 220,000 sequences of 500 bp subsampled from the Virus and Host RefSeq genomes, DETIRE identifies more short viral sequences (<1,000 bp) than the three latest methods, such as DeepVirFinder, PPR-Meta, and CHEER. DETIRE is freely available at Github (https://github.com/crazyinter/DETIRE). Frontiers Media S.A. 2023-06-16 /pmc/articles/PMC10313334/ /pubmed/37396369 http://dx.doi.org/10.3389/fmicb.2023.1169791 Text en Copyright © 2023 Miao, Bian, Dong and Dai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Miao, Yan
Bian, Jilong
Dong, Guanghui
Dai, Tianhong
DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title_full DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title_fullStr DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title_full_unstemmed DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title_short DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes
title_sort detire: a hybrid deep learning model for identifying viral sequences from metagenomes
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10313334/
https://www.ncbi.nlm.nih.gov/pubmed/37396369
http://dx.doi.org/10.3389/fmicb.2023.1169791
work_keys_str_mv AT miaoyan detireahybriddeeplearningmodelforidentifyingviralsequencesfrommetagenomes
AT bianjilong detireahybriddeeplearningmodelforidentifyingviralsequencesfrommetagenomes
AT dongguanghui detireahybriddeeplearningmodelforidentifyingviralsequencesfrommetagenomes
AT daitianhong detireahybriddeeplearningmodelforidentifyingviralsequencesfrommetagenomes