Cargando…
Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonom...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224470/ https://www.ncbi.nlm.nih.gov/pubmed/27899557 http://dx.doi.org/10.1093/nar/gkw1002 |
_version_ | 1782493365083832320 |
---|---|
author | Ahlgren, Nathan A. Ren, Jie Lu, Yang Young Fuhrman, Jed A. Sun, Fengzhu |
author_facet | Ahlgren, Nathan A. Ren, Jie Lu, Yang Young Fuhrman, Jed A. Sun, Fengzhu |
author_sort | Ahlgren, Nathan A. |
collection | PubMed |
description | Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses. |
format | Online Article Text |
id | pubmed-5224470 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-52244702017-01-17 Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences Ahlgren, Nathan A. Ren, Jie Lu, Yang Young Fuhrman, Jed A. Sun, Fengzhu Nucleic Acids Res Computational Biology Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, [Formula: see text] host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, [Formula: see text]-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The [Formula: see text] ONF method will greatly improve the characterization of novel, metagenomic viruses. Oxford University Press 2017-01-09 2016-11-28 /pmc/articles/PMC5224470/ /pubmed/27899557 http://dx.doi.org/10.1093/nar/gkw1002 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Ahlgren, Nathan A. Ren, Jie Lu, Yang Young Fuhrman, Jed A. Sun, Fengzhu Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title | Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title_full | Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title_fullStr | Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title_full_unstemmed | Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title_short | Alignment-free [Formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
title_sort | alignment-free [formula: see text] oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5224470/ https://www.ncbi.nlm.nih.gov/pubmed/27899557 http://dx.doi.org/10.1093/nar/gkw1002 |
work_keys_str_mv | AT ahlgrennathana alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences AT renjie alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences AT luyangyoung alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences AT fuhrmanjeda alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences AT sunfengzhu alignmentfreeformulaseetextoligonucleotidefrequencydissimilaritymeasureimprovespredictionofhostsfrommetagenomicallyderivedviralsequences |