Cargando…

efam: an expanded, metaproteome-supported HMM profile database of viral protein families

MOTIVATION: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging...

Descripción completa

Detalles Bibliográficos
Autores principales: Zayed, Ahmed A, Lücking, Dominik, Mohssen, Mohamed, Cronin, Dylan, Bolduc, Ben, Gregory, Ann C, Hargreaves, Katherine R, Piehowski, Paul D, White III, Richard A, Huang, Eric L, Adkins, Joshua N, Roux, Simon, Moraru, Cristina, Sullivan, Matthew B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502166/
https://www.ncbi.nlm.nih.gov/pubmed/34132786
http://dx.doi.org/10.1093/bioinformatics/btab451
_version_ 1784795639172825088
author Zayed, Ahmed A
Lücking, Dominik
Mohssen, Mohamed
Cronin, Dylan
Bolduc, Ben
Gregory, Ann C
Hargreaves, Katherine R
Piehowski, Paul D
White III, Richard A
Huang, Eric L
Adkins, Joshua N
Roux, Simon
Moraru, Cristina
Sullivan, Matthew B
author_facet Zayed, Ahmed A
Lücking, Dominik
Mohssen, Mohamed
Cronin, Dylan
Bolduc, Ben
Gregory, Ann C
Hargreaves, Katherine R
Piehowski, Paul D
White III, Richard A
Huang, Eric L
Adkins, Joshua N
Roux, Simon
Moraru, Cristina
Sullivan, Matthew B
author_sort Zayed, Ahmed A
collection PubMed
description MOTIVATION: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. RESULTS: Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from ‘conservative’ to ‘eXtremely Conservative’ resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. AVAILABILITY AND IMPLEMENTATION: The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9502166
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95021662022-09-26 efam: an expanded, metaproteome-supported HMM profile database of viral protein families Zayed, Ahmed A Lücking, Dominik Mohssen, Mohamed Cronin, Dylan Bolduc, Ben Gregory, Ann C Hargreaves, Katherine R Piehowski, Paul D White III, Richard A Huang, Eric L Adkins, Joshua N Roux, Simon Moraru, Cristina Sullivan, Matthew B Bioinformatics Original Papers MOTIVATION: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. RESULTS: Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from ‘conservative’ to ‘eXtremely Conservative’ resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. AVAILABILITY AND IMPLEMENTATION: The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-06-16 /pmc/articles/PMC9502166/ /pubmed/34132786 http://dx.doi.org/10.1093/bioinformatics/btab451 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Zayed, Ahmed A
Lücking, Dominik
Mohssen, Mohamed
Cronin, Dylan
Bolduc, Ben
Gregory, Ann C
Hargreaves, Katherine R
Piehowski, Paul D
White III, Richard A
Huang, Eric L
Adkins, Joshua N
Roux, Simon
Moraru, Cristina
Sullivan, Matthew B
efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title_full efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title_fullStr efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title_full_unstemmed efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title_short efam: an expanded, metaproteome-supported HMM profile database of viral protein families
title_sort efam: an expanded, metaproteome-supported hmm profile database of viral protein families
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502166/
https://www.ncbi.nlm.nih.gov/pubmed/34132786
http://dx.doi.org/10.1093/bioinformatics/btab451
work_keys_str_mv AT zayedahmeda efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT luckingdominik efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT mohssenmohamed efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT cronindylan efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT bolducben efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT gregoryannc efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT hargreaveskatheriner efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT piehowskipauld efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT whiteiiiricharda efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT huangericl efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT adkinsjoshuan efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT rouxsimon efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT morarucristina efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies
AT sullivanmatthewb efamanexpandedmetaproteomesupportedhmmprofiledatabaseofviralproteinfamilies