Cargando…

Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity

BACKGROUND: Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards...

Descripción completa

Detalles Bibliográficos
Autores principales: Roux, Simon, Emerson, Joanne B., Eloe-Fadrosh, Emiley A., Sullivan, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610896/
https://www.ncbi.nlm.nih.gov/pubmed/28948103
http://dx.doi.org/10.7717/peerj.3817
_version_ 1783265848823119872
author Roux, Simon
Emerson, Joanne B.
Eloe-Fadrosh, Emiley A.
Sullivan, Matthew B.
author_facet Roux, Simon
Emerson, Joanne B.
Eloe-Fadrosh, Emiley A.
Sullivan, Matthew B.
author_sort Roux, Simon
collection PubMed
description BACKGROUND: Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. RESULTS: Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. CONCLUSIONS: These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.
format Online
Article
Text
id pubmed-5610896
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-56108962017-09-25 Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity Roux, Simon Emerson, Joanne B. Eloe-Fadrosh, Emiley A. Sullivan, Matthew B. PeerJ Bioinformatics BACKGROUND: Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. RESULTS: Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. CONCLUSIONS: These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations. PeerJ Inc. 2017-09-21 /pmc/articles/PMC5610896/ /pubmed/28948103 http://dx.doi.org/10.7717/peerj.3817 Text en ©2017 Roux et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Roux, Simon
Emerson, Joanne B.
Eloe-Fadrosh, Emiley A.
Sullivan, Matthew B.
Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title_full Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title_fullStr Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title_full_unstemmed Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title_short Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
title_sort benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5610896/
https://www.ncbi.nlm.nih.gov/pubmed/28948103
http://dx.doi.org/10.7717/peerj.3817
work_keys_str_mv AT rouxsimon benchmarkingviromicsaninsilicoevaluationofmetagenomeenabledestimatesofviralcommunitycompositionanddiversity
AT emersonjoanneb benchmarkingviromicsaninsilicoevaluationofmetagenomeenabledestimatesofviralcommunitycompositionanddiversity
AT eloefadroshemileya benchmarkingviromicsaninsilicoevaluationofmetagenomeenabledestimatesofviralcommunitycompositionanddiversity
AT sullivanmatthewb benchmarkingviromicsaninsilicoevaluationofmetagenomeenabledestimatesofviralcommunitycompositionanddiversity