Cargando…

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics

BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological prope...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Congyu, Zhang, Zheng, Cai, Zena, Zhu, Zhaozhong, Qiu, Ye, Wu, Aiping, Jiang, Taijiao, Zheng, Heping, Peng, Yousong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807511/
https://www.ncbi.nlm.nih.gov/pubmed/33441133
http://dx.doi.org/10.1186/s12915-020-00938-6
_version_ 1783636756711604224
author Lu, Congyu
Zhang, Zheng
Cai, Zena
Zhu, Zhaozhong
Qiu, Ye
Wu, Aiping
Jiang, Taijiao
Zheng, Heping
Peng, Yousong
author_facet Lu, Congyu
Zhang, Zheng
Cai, Zena
Zhu, Zhaozhong
Qiu, Ye
Wu, Aiping
Jiang, Taijiao
Zheng, Heping
Peng, Yousong
author_sort Lu, Congyu
collection PubMed
description BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. RESULTS: We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. CONCLUSIONS: The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.
format Online
Article
Text
id pubmed-7807511
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78075112021-01-14 Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics Lu, Congyu Zhang, Zheng Cai, Zena Zhu, Zhaozhong Qiu, Ye Wu, Aiping Jiang, Taijiao Zheng, Heping Peng, Yousong BMC Biol Methodology Article BACKGROUND: Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. RESULTS: We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. CONCLUSIONS: The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies. BioMed Central 2021-01-14 /pmc/articles/PMC7807511/ /pubmed/33441133 http://dx.doi.org/10.1186/s12915-020-00938-6 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Lu, Congyu
Zhang, Zheng
Cai, Zena
Zhu, Zhaozhong
Qiu, Ye
Wu, Aiping
Jiang, Taijiao
Zheng, Heping
Peng, Yousong
Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title_full Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title_fullStr Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title_full_unstemmed Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title_short Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics
title_sort prokaryotic virus host predictor: a gaussian model for host prediction of prokaryotic viruses in metagenomics
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7807511/
https://www.ncbi.nlm.nih.gov/pubmed/33441133
http://dx.doi.org/10.1186/s12915-020-00938-6
work_keys_str_mv AT lucongyu prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT zhangzheng prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT caizena prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT zhuzhaozhong prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT qiuye prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT wuaiping prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT jiangtaijiao prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT zhengheping prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics
AT pengyousong prokaryoticvirushostpredictoragaussianmodelforhostpredictionofprokaryoticvirusesinmetagenomics