Cargando…
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that co...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244389/ https://www.ncbi.nlm.nih.gov/pubmed/28102365 http://dx.doi.org/10.1038/srep40712 |
_version_ | 1782496690538807296 |
---|---|
author | Zhang, Qian Jun, Se-Ran Leuze, Michael Ussery, David Nookaew, Intawat |
author_facet | Zhang, Qian Jun, Se-Ran Leuze, Michael Ussery, David Nookaew, Intawat |
author_sort | Zhang, Qian |
collection | PubMed |
description | The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. |
format | Online Article Text |
id | pubmed-5244389 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-52443892017-01-23 Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer Zhang, Qian Jun, Se-Ran Leuze, Michael Ussery, David Nookaew, Intawat Sci Rep Article The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. Nature Publishing Group 2017-01-19 /pmc/articles/PMC5244389/ /pubmed/28102365 http://dx.doi.org/10.1038/srep40712 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Zhang, Qian Jun, Se-Ran Leuze, Michael Ussery, David Nookaew, Intawat Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title | Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title_full | Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title_fullStr | Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title_full_unstemmed | Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title_short | Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer |
title_sort | viral phylogenomics using an alignment-free method: a three-step approach to determine optimal length of k-mer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244389/ https://www.ncbi.nlm.nih.gov/pubmed/28102365 http://dx.doi.org/10.1038/srep40712 |
work_keys_str_mv | AT zhangqian viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer AT junseran viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer AT leuzemichael viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer AT usserydavid viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer AT nookaewintawat viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer |