Cargando…

ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples

Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed...

Descripción completa

Detalles Bibliográficos
Autores principales: Tampuu, Ardi, Bzhalava, Zurab, Dillner, Joakim, Vicente, Raul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6738585/
https://www.ncbi.nlm.nih.gov/pubmed/31509583
http://dx.doi.org/10.1371/journal.pone.0222271
_version_ 1783450834856574976
author Tampuu, Ardi
Bzhalava, Zurab
Dillner, Joakim
Vicente, Raul
author_facet Tampuu, Ardi
Bzhalava, Zurab
Dillner, Joakim
Vicente, Raul
author_sort Tampuu, Ardi
collection PubMed
description Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.
format Online
Article
Text
id pubmed-6738585
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67385852019-09-20 ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples Tampuu, Ardi Bzhalava, Zurab Dillner, Joakim Vicente, Raul PLoS One Research Article Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases. Public Library of Science 2019-09-11 /pmc/articles/PMC6738585/ /pubmed/31509583 http://dx.doi.org/10.1371/journal.pone.0222271 Text en © 2019 Tampuu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tampuu, Ardi
Bzhalava, Zurab
Dillner, Joakim
Vicente, Raul
ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title_full ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title_fullStr ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title_full_unstemmed ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title_short ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples
title_sort viraminer: deep learning on raw dna sequences for identifying viral genomes in human samples
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6738585/
https://www.ncbi.nlm.nih.gov/pubmed/31509583
http://dx.doi.org/10.1371/journal.pone.0222271
work_keys_str_mv AT tampuuardi viraminerdeeplearningonrawdnasequencesforidentifyingviralgenomesinhumansamples
AT bzhalavazurab viraminerdeeplearningonrawdnasequencesforidentifyingviralgenomesinhumansamples
AT dillnerjoakim viraminerdeeplearningonrawdnasequencesforidentifyingviralgenomesinhumansamples
AT vicenteraul viraminerdeeplearningonrawdnasequencesforidentifyingviralgenomesinhumansamples