Cargando…

Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences

Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonom...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahlgren, Nathan A., Ren, Jie, Lu, Yang Young, Fuhrman, Jed A., Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224470/
https://www.ncbi.nlm.nih.gov/pubmed/27899557
http://dx.doi.org/10.1093/nar/gkw1002
_version_ 1782493365083832320
author Ahlgren, Nathan A.
Ren, Jie
Lu, Yang Young
Fuhrman, Jed A.
Sun, Fengzhu
author_facet Ahlgren, Nathan A.
Ren, Jie
Lu, Yang Young
Fuhrman, Jed A.
Sun, Fengzhu
author_sort Ahlgren, Nathan A.
collection PubMed
description Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses.
format Online
Article
Text
id pubmed-5224470
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-52244702017-01-17 Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences Ahlgren, Nathan A. Ren, Jie Lu, Yang Young Fuhrman, Jed A. Sun, Fengzhu Nucleic Acids Res Computational Biology Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses. Oxford University Press 2017-01-09 2016-11-28 /pmc/articles/PMC5224470/ /pubmed/27899557 http://dx.doi.org/10.1093/nar/gkw1002 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Ahlgren, Nathan A.
Ren, Jie
Lu, Yang Young
Fuhrman, Jed A.
Sun, Fengzhu
Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title_full Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title_fullStr Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title_full_unstemmed Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title_short Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
title_sort alignment-free [formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224470/
https://www.ncbi.nlm.nih.gov/pubmed/27899557
http://dx.doi.org/10.1093/nar/gkw1002
work_keys_str_mv AT ahlgrennathana alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences
AT renjie alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences
AT luyangyoung alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences
AT fuhrmanjeda alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences
AT sunfengzhu alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences