Cargando…

A64 Viral sequence classification using deep learning algorithms

Sewage samples have a high potential benefit for surveillance of circulating pathogens because they are easy to obtain and reflect population-wide circulation of pathogens. These type of samples typically contain a great diversity of viruses. Therefore, one of the main challenges of metagenomic sequ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nieuwenhuijse, David, Munnink, Bas Oude, Phan, My, Koopmans, Marion
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Abstract Overview
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736044/ http://dx.doi.org/10.1093/ve/vez002.063

_version_	1783450448617799680
author	Nieuwenhuijse, David Munnink, Bas Oude Phan, My Koopmans, Marion
author_facet	Nieuwenhuijse, David Munnink, Bas Oude Phan, My Koopmans, Marion
author_sort	Nieuwenhuijse, David
collection	PubMed
description	Sewage samples have a high potential benefit for surveillance of circulating pathogens because they are easy to obtain and reflect population-wide circulation of pathogens. These type of samples typically contain a great diversity of viruses. Therefore, one of the main challenges of metagenomic sequencing of sewage for surveillance is sequence annotation and interpretation. Especially for high-threat viruses, false positive signals can trigger unnecessary alerts, but true positives should not be missed. Annotation thus requires high sensitivity and specificity. To better interpret annotated reads for high-threat viruses, we attempt to determine how classifiable they are in a background of reads of closely related low-threat viruses. As an example, we attempted to distinguish poliovirus reads, a virus of high public health importance, from other enterovirus reads. A sequence-based deep learning algorithm was used to classify reads as either polio or non-polio enterovirus. Short reads were generated from 500 polio and 2,000 non-polio enterovirus genomes as a training set. By training the algorithm on this dataset we try to determine, on a single read level, which short reads can reliably be labeled as poliovirus and which cannot. After training the deep learning algorithm on the generated reads we were able to calculate the probability with which a read can be assigned to a poliovirus genome or a non-poliovirus genome. We show that the algorithm succeeds in classifying the reads with high accuracy. The probability of assigning the read to the correct class was related to the location in the genome to which the read mapped, which conformed with our expectations since some regions of the genome are more conserved than others. Classifying short reads of high-threat viral pathogens seems to be a promising application of sequence-based deep learning algorithms. Also, recent developments in software and hardware have facilitated the development and training of deep learning algorithms. Further plans of this work are to characterize the hard-to-classify regions of the poliovirus genome, build larger training databases, and expand on the current approach to other viruses.
format	Online Article Text
id	pubmed-6736044
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-67360442019-09-16 A64 Viral sequence classification using deep learning algorithms Nieuwenhuijse, David Munnink, Bas Oude Phan, My Koopmans, Marion Virus Evol Abstract Overview Sewage samples have a high potential benefit for surveillance of circulating pathogens because they are easy to obtain and reflect population-wide circulation of pathogens. These type of samples typically contain a great diversity of viruses. Therefore, one of the main challenges of metagenomic sequencing of sewage for surveillance is sequence annotation and interpretation. Especially for high-threat viruses, false positive signals can trigger unnecessary alerts, but true positives should not be missed. Annotation thus requires high sensitivity and specificity. To better interpret annotated reads for high-threat viruses, we attempt to determine how classifiable they are in a background of reads of closely related low-threat viruses. As an example, we attempted to distinguish poliovirus reads, a virus of high public health importance, from other enterovirus reads. A sequence-based deep learning algorithm was used to classify reads as either polio or non-polio enterovirus. Short reads were generated from 500 polio and 2,000 non-polio enterovirus genomes as a training set. By training the algorithm on this dataset we try to determine, on a single read level, which short reads can reliably be labeled as poliovirus and which cannot. After training the deep learning algorithm on the generated reads we were able to calculate the probability with which a read can be assigned to a poliovirus genome or a non-poliovirus genome. We show that the algorithm succeeds in classifying the reads with high accuracy. The probability of assigning the read to the correct class was related to the location in the genome to which the read mapped, which conformed with our expectations since some regions of the genome are more conserved than others. Classifying short reads of high-threat viral pathogens seems to be a promising application of sequence-based deep learning algorithms. Also, recent developments in software and hardware have facilitated the development and training of deep learning algorithms. Further plans of this work are to characterize the hard-to-classify regions of the poliovirus genome, build larger training databases, and expand on the current approach to other viruses. Oxford University Press 2019-08-22 /pmc/articles/PMC6736044/ http://dx.doi.org/10.1093/ve/vez002.063 Text en © Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access publication distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Abstract Overview Nieuwenhuijse, David Munnink, Bas Oude Phan, My Koopmans, Marion A64 Viral sequence classification using deep learning algorithms
title	A64 Viral sequence classification using deep learning algorithms
title_full	A64 Viral sequence classification using deep learning algorithms
title_fullStr	A64 Viral sequence classification using deep learning algorithms
title_full_unstemmed	A64 Viral sequence classification using deep learning algorithms
title_short	A64 Viral sequence classification using deep learning algorithms
title_sort	a64 viral sequence classification using deep learning algorithms
topic	Abstract Overview
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6736044/ http://dx.doi.org/10.1093/ve/vez002.063
work_keys_str_mv	AT nieuwenhuijsedavid a64viralsequenceclassificationusingdeeplearningalgorithms AT munninkbasoude a64viralsequenceclassificationusingdeeplearningalgorithms AT phanmy a64viralsequenceclassificationusingdeeplearningalgorithms AT koopmansmarion a64viralsequenceclassificationusingdeeplearningalgorithms

A64 Viral sequence classification using deep learning algorithms

Ejemplares similares