Cargando…

Evaluation of viral genome assembly and diversity estimation in deep metagenomes

BACKGROUND: Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing pla...

Descripción completa

Detalles Bibliográficos
Autores principales: Aguirre de Cárcer, Daniel, Angly, Florent E, Alcamí, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4247695/
https://www.ncbi.nlm.nih.gov/pubmed/25407630
http://dx.doi.org/10.1186/1471-2164-15-989
_version_ 1782346684496347136
author Aguirre de Cárcer, Daniel
Angly, Florent E
Alcamí, Antonio
author_facet Aguirre de Cárcer, Daniel
Angly, Florent E
Alcamí, Antonio
author_sort Aguirre de Cárcer, Daniel
collection PubMed
description BACKGROUND: Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes. RESULTS: We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected. CONCLUSIONS: Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-989) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4247695
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42476952014-12-02 Evaluation of viral genome assembly and diversity estimation in deep metagenomes Aguirre de Cárcer, Daniel Angly, Florent E Alcamí, Antonio BMC Genomics Research Article BACKGROUND: Viruses have unique properties, small genome and regions of high similarity, whose effects on metagenomic assemblies have not been characterized so far. This study uses diverse in silico simulated viromes to evaluate how extensively genomes can be assembled using different sequencing platforms and assemblers. Further, it investigates the suitability of different methods to estimate viral diversity in metagenomes. RESULTS: We created in silico metagenomes mimicking various platforms at different sequencing depths. The CLC assembler revealed subpar compared to IDBA_UD and CAMERA , which are metagenomic-specific. Up to a saturation point, Illumina platforms proved more capable of reconstructing large portions of viral genomes compared to 454. Read length was an important factor for limiting chimericity, while scaffolding marginally improved contig length and accuracy. The genome length of the various viruses in the metagenomes did not significantly affect genome reconstruction, but the co-existence of highly similar genomes was detrimental. When evaluating diversity estimation tools, we found that PHACCS results were more accurate than those from CatchAll and clustering, which were both orders of magnitude above expected. CONCLUSIONS: Assemblers designed specifically for the analysis of metagenomes should be used to facilitate the creation of high-quality long contigs. Despite the high coverage possible, scientists should not expect to always obtain complete genomes, because their reconstruction may be hindered by co-existing species bearing highly similar genomic regions. Further development of metagenomics-oriented assemblers may help bypass these limitations in future studies. Meanwhile, the lack of fully reconstructed communities keeps methods to estimate viral diversity relevant. While none of the three methods tested had absolute precision, only PHACCS was deemed suitable for comparative studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-989) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-18 /pmc/articles/PMC4247695/ /pubmed/25407630 http://dx.doi.org/10.1186/1471-2164-15-989 Text en © Aguirre de Cárcer et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Aguirre de Cárcer, Daniel
Angly, Florent E
Alcamí, Antonio
Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title_full Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title_fullStr Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title_full_unstemmed Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title_short Evaluation of viral genome assembly and diversity estimation in deep metagenomes
title_sort evaluation of viral genome assembly and diversity estimation in deep metagenomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4247695/
https://www.ncbi.nlm.nih.gov/pubmed/25407630
http://dx.doi.org/10.1186/1471-2164-15-989
work_keys_str_mv AT aguirredecarcerdaniel evaluationofviralgenomeassemblyanddiversityestimationindeepmetagenomes
AT anglyflorente evaluationofviralgenomeassemblyanddiversityestimationindeepmetagenomes
AT alcamiantonio evaluationofviralgenomeassemblyanddiversityestimationindeepmetagenomes