Cargando…

Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation

BACKGROUND: Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes....

Descripción completa

Detalles Bibliográficos
Autores principales: Pratama, Akbar Adjie, Bolduc, Benjamin, Zayed, Ahmed A., Zhong, Zhi-Ping, Guo, Jiarong, Vik, Dean R., Gazitúa, Maria Consuelo, Wainaina, James M., Roux, Simon, Sullivan, Matthew B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210812/
https://www.ncbi.nlm.nih.gov/pubmed/34178438
http://dx.doi.org/10.7717/peerj.11447
_version_ 1783709379840704512
author Pratama, Akbar Adjie
Bolduc, Benjamin
Zayed, Ahmed A.
Zhong, Zhi-Ping
Guo, Jiarong
Vik, Dean R.
Gazitúa, Maria Consuelo
Wainaina, James M.
Roux, Simon
Sullivan, Matthew B.
author_facet Pratama, Akbar Adjie
Bolduc, Benjamin
Zayed, Ahmed A.
Zhong, Zhi-Ping
Guo, Jiarong
Vik, Dean R.
Gazitúa, Maria Consuelo
Wainaina, James M.
Roux, Simon
Sullivan, Matthew B.
author_sort Pratama, Akbar Adjie
collection PubMed
description BACKGROUND: Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). RESULTS: The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. CONCLUSION: Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.
format Online
Article
Text
id pubmed-8210812
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-82108122021-06-25 Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation Pratama, Akbar Adjie Bolduc, Benjamin Zayed, Ahmed A. Zhong, Zhi-Ping Guo, Jiarong Vik, Dean R. Gazitúa, Maria Consuelo Wainaina, James M. Roux, Simon Sullivan, Matthew B. PeerJ Bioinformatics BACKGROUND: Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). RESULTS: The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. CONCLUSION: Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets. PeerJ Inc. 2021-06-14 /pmc/articles/PMC8210812/ /pubmed/34178438 http://dx.doi.org/10.7717/peerj.11447 Text en ©2021 Pratama et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Pratama, Akbar Adjie
Bolduc, Benjamin
Zayed, Ahmed A.
Zhong, Zhi-Ping
Guo, Jiarong
Vik, Dean R.
Gazitúa, Maria Consuelo
Wainaina, James M.
Roux, Simon
Sullivan, Matthew B.
Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title_full Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title_fullStr Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title_full_unstemmed Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title_short Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation
title_sort expanding standards in viromics: in silico evaluation of dsdna viral genome identification, classification, and auxiliary metabolic gene curation
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210812/
https://www.ncbi.nlm.nih.gov/pubmed/34178438
http://dx.doi.org/10.7717/peerj.11447
work_keys_str_mv AT pratamaakbaradjie expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT bolducbenjamin expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT zayedahmeda expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT zhongzhiping expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT guojiarong expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT vikdeanr expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT gazituamariaconsuelo expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT wainainajamesm expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT rouxsimon expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration
AT sullivanmatthewb expandingstandardsinviromicsinsilicoevaluationofdsdnaviralgenomeidentificationclassificationandauxiliarymetabolicgenecuration