Cargando…
An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the s...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402509/ https://www.ncbi.nlm.nih.gov/pubmed/25586223 http://dx.doi.org/10.1093/nar/gkv002 |
_version_ | 1782367265928249344 |
---|---|
author | Deng, Xutao Naccache, Samia N. Ng, Terry Federman, Scot Li, Linlin Chiu, Charles Y. Delwart, Eric L. |
author_facet | Deng, Xutao Naccache, Samia N. Ng, Terry Federman, Scot Li, Linlin Chiu, Charles Y. Delwart, Eric L. |
author_sort | Deng, Xutao |
collection | PubMed |
description | Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. |
format | Online Article Text |
id | pubmed-4402509 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44025092015-04-29 An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data Deng, Xutao Naccache, Samia N. Ng, Terry Federman, Scot Li, Linlin Chiu, Charles Y. Delwart, Eric L. Nucleic Acids Res Methods Online Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. Oxford University Press 2015-04-20 2015-01-13 /pmc/articles/PMC4402509/ /pubmed/25586223 http://dx.doi.org/10.1093/nar/gkv002 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Deng, Xutao Naccache, Samia N. Ng, Terry Federman, Scot Li, Linlin Chiu, Charles Y. Delwart, Eric L. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title | An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title_full | An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title_fullStr | An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title_full_unstemmed | An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title_short | An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
title_sort | ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402509/ https://www.ncbi.nlm.nih.gov/pubmed/25586223 http://dx.doi.org/10.1093/nar/gkv002 |
work_keys_str_mv | AT dengxutao anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT naccachesamian anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT ngterry anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT federmanscot anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT lilinlin anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT chiucharlesy anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT delwartericl anensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT dengxutao ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT naccachesamian ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT ngterry ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT federmanscot ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT lilinlin ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT chiucharlesy ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata AT delwartericl ensemblestrategythatsignificantlyimprovesdenovoassemblyofmicrobialgenomesfrommetagenomicnextgenerationsequencingdata |