Cargando…

An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the s...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Xutao, Naccache, Samia N., Ng, Terry, Federman, Scot, Li, Linlin, Chiu, Charles Y., Delwart, Eric L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402509/
https://www.ncbi.nlm.nih.gov/pubmed/25586223
http://dx.doi.org/10.1093/nar/gkv002
_version_ 1782367265928249344
author Deng, Xutao
Naccache, Samia N.
Ng, Terry
Federman, Scot
Li, Linlin
Chiu, Charles Y.
Delwart, Eric L.
author_facet Deng, Xutao
Naccache, Samia N.
Ng, Terry
Federman, Scot
Li, Linlin
Chiu, Charles Y.
Delwart, Eric L.
author_sort Deng, Xutao
collection PubMed
description Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.
format Online
Article
Text
id pubmed-4402509
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44025092015-04-29 An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data Deng, Xutao Naccache, Samia N. Ng, Terry Federman, Scot Li, Linlin Chiu, Charles Y. Delwart, Eric L. Nucleic Acids Res Methods Online Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. Oxford University Press 2015-04-20 2015-01-13 /pmc/articles/PMC4402509/ /pubmed/25586223 http://dx.doi.org/10.1093/nar/gkv002 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Deng, Xutao
Naccache, Samia N.
Ng, Terry
Federman, Scot
Li, Linlin
Chiu, Charles Y.
Delwart, Eric L.
An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title_full An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title_fullStr An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title_full_unstemmed An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title_short An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
title_sort ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402509/
https://www.ncbi.nlm.nih.gov/pubmed/25586223
http://dx.doi.org/10.1093/nar/gkv002
work_keys_str_mv AT dengxutao anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT naccachesamian anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT ngterry anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT federmanscot anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT lilinlin anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT chiucharlesy anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT delwartericl anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT dengxutao ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT naccachesamian ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT ngterry ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT federmanscot ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT lilinlin ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT chiucharlesy ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata
AT delwartericl ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata