Cargando…

Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline

Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected t...

Descripción completa

Detalles Bibliográficos
Autores principales: White, Daniel J., Wang, Jing, Hall, Richard J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610382/
https://www.ncbi.nlm.nih.gov/pubmed/28414526
http://dx.doi.org/10.1089/cmb.2017.0008
_version_ 1783265770312040448
author White, Daniel J.
Wang, Jing
Hall, Richard J.
author_facet White, Daniel J.
Wang, Jing
Hall, Richard J.
author_sort White, Daniel J.
collection PubMed
description Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected tissue sample, followed by assembly of these reads into longer, contiguous stretches of nucleotide sequences, and then identification of the contigs by matching them to known databases, such as those stored at GenBank or Ensembl. This technique, that is, de novo metagenomics, is particularly useful when the pathogen is viral and strong discriminatory power can be achieved. However, recently, we found that striking differences in results can be achieved when different assemblers were used. In this study, we test formally the impact of five popular assemblers (MIRA, VELVET, METAVELVET, SPADES, and OMEGA) on the detection of a novel virus and assembly of its whole genome in a data set for which we have confirmed the presence of the virus by empirical laboratory techniques, and compare the overall performance between assemblers. Our results show that if results from only one assembler are considered, biologically important reads can easily be overlooked. The impacts of these results on the field of pathogen discovery are considered.
format Online
Article
Text
id pubmed-5610382
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Mary Ann Liebert, Inc.
record_format MEDLINE/PubMed
spelling pubmed-56103822017-09-25 Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline White, Daniel J. Wang, Jing Hall, Richard J. J Comput Biol Research Articles Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected tissue sample, followed by assembly of these reads into longer, contiguous stretches of nucleotide sequences, and then identification of the contigs by matching them to known databases, such as those stored at GenBank or Ensembl. This technique, that is, de novo metagenomics, is particularly useful when the pathogen is viral and strong discriminatory power can be achieved. However, recently, we found that striking differences in results can be achieved when different assemblers were used. In this study, we test formally the impact of five popular assemblers (MIRA, VELVET, METAVELVET, SPADES, and OMEGA) on the detection of a novel virus and assembly of its whole genome in a data set for which we have confirmed the presence of the virus by empirical laboratory techniques, and compare the overall performance between assemblers. Our results show that if results from only one assembler are considered, biologically important reads can easily be overlooked. The impacts of these results on the field of pathogen discovery are considered. Mary Ann Liebert, Inc. 2017-09-01 2017-09-01 /pmc/articles/PMC5610382/ /pubmed/28414526 http://dx.doi.org/10.1089/cmb.2017.0008 Text en © Daniel J. White et al., 2017. Published by Mary Ann Liebert, Inc. This Open Access article is distributed under the terms of the Creative Commons Attribution Noncommerical License (http://creativecommons.org/licenses/by-nc/4.0/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
spellingShingle Research Articles
White, Daniel J.
Wang, Jing
Hall, Richard J.
Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title_full Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title_fullStr Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title_full_unstemmed Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title_short Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline
title_sort assessing the impact of assemblers on virus detection in a de novo metagenomic analysis pipeline
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610382/
https://www.ncbi.nlm.nih.gov/pubmed/28414526
http://dx.doi.org/10.1089/cmb.2017.0008
work_keys_str_mv AT whitedanielj assessingtheimpactofassemblersonvirusdetectioninadenovometagenomicanalysispipeline
AT wangjing assessingtheimpactofassemblersonvirusdetectioninadenovometagenomicanalysispipeline
AT hallrichardj assessingtheimpactofassemblersonvirusdetectioninadenovometagenomicanalysispipeline