Cargando…

Interpretable detection of novel human viruses from genome sequencing data

Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from n...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bartoszewicz, Jakub M, Seidel, Anja, Renard, Bernhard Y
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Standard Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7849996/ https://www.ncbi.nlm.nih.gov/pubmed/33554119 http://dx.doi.org/10.1093/nargab/lqab004

_version_	1783645393129570304
author	Bartoszewicz, Jakub M Seidel, Anja Renard, Bernhard Y
author_facet	Bartoszewicz, Jakub M Seidel, Anja Renard, Bernhard Y
author_sort	Bartoszewicz, Jakub M
collection	PubMed
description	Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics.
format	Online Article Text
id	pubmed-7849996
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-78499962021-02-04 Interpretable detection of novel human viruses from genome sequencing data Bartoszewicz, Jakub M Seidel, Anja Renard, Bernhard Y NAR Genom Bioinform Standard Article Viruses evolve extremely quickly, so reliable methods for viral host prediction are necessary to safeguard biosecurity and biosafety alike. Novel human-infecting viruses are difficult to detect with standard bioinformatics workflows. Here, we predict whether a virus can infect humans directly from next-generation sequencing reads. We show that deep neural architectures significantly outperform both shallow machine learning and standard, homology-based algorithms, cutting the error rates in half and generalizing to taxonomic units distant from those presented during training. Further, we develop a suite of interpretability tools and show that it can be applied also to other models beyond the host prediction task. We propose a new approach for convolutional filter visualization to disentangle the information content of each nucleotide from its contribution to the final classification decision. Nucleotide-resolution maps of the learned associations between pathogen genomes and the infectious phenotype can be used to detect regions of interest in novel agents, for example, the SARS-CoV-2 coronavirus, unknown before it caused a COVID-19 pandemic in 2020. All methods presented here are implemented as easy-to-install packages not only enabling analysis of NGS datasets without requiring any deep learning skills, but also allowing advanced users to easily train and explain new models for genomics. Oxford University Press 2021-02-01 /pmc/articles/PMC7849996/ /pubmed/33554119 http://dx.doi.org/10.1093/nargab/lqab004 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Standard Article Bartoszewicz, Jakub M Seidel, Anja Renard, Bernhard Y Interpretable detection of novel human viruses from genome sequencing data
title	Interpretable detection of novel human viruses from genome sequencing data
title_full	Interpretable detection of novel human viruses from genome sequencing data
title_fullStr	Interpretable detection of novel human viruses from genome sequencing data
title_full_unstemmed	Interpretable detection of novel human viruses from genome sequencing data
title_short	Interpretable detection of novel human viruses from genome sequencing data
title_sort	interpretable detection of novel human viruses from genome sequencing data
topic	Standard Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7849996/ https://www.ncbi.nlm.nih.gov/pubmed/33554119 http://dx.doi.org/10.1093/nargab/lqab004
work_keys_str_mv	AT bartoszewiczjakubm interpretabledetectionofnovelhumanvirusesfromgenomesequencingdata AT seidelanja interpretabledetectionofnovelhumanvirusesfromgenomesequencingdata AT renardbernhardy interpretabledetectionofnovelhumanvirusesfromgenomesequencingdata

Interpretable detection of novel human viruses from genome sequencing data

Ejemplares similares