Cargando…

Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that co...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qian, Jun, Se-Ran, Leuze, Michael, Ussery, David, Nookaew, Intawat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244389/
https://www.ncbi.nlm.nih.gov/pubmed/28102365
http://dx.doi.org/10.1038/srep40712
_version_ 1782496690538807296
author Zhang, Qian
Jun, Se-Ran
Leuze, Michael
Ussery, David
Nookaew, Intawat
author_facet Zhang, Qian
Jun, Se-Ran
Leuze, Michael
Ussery, David
Nookaew, Intawat
author_sort Zhang, Qian
collection PubMed
description The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
format Online
Article
Text
id pubmed-5244389
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-52443892017-01-23 Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer Zhang, Qian Jun, Se-Ran Leuze, Michael Ussery, David Nookaew, Intawat Sci Rep Article The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. Nature Publishing Group 2017-01-19 /pmc/articles/PMC5244389/ /pubmed/28102365 http://dx.doi.org/10.1038/srep40712 Text en Copyright © 2017, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Zhang, Qian
Jun, Se-Ran
Leuze, Michael
Ussery, David
Nookaew, Intawat
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title_full Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title_fullStr Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title_full_unstemmed Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title_short Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
title_sort viral phylogenomics using an alignment-free method: a three-step approach to determine optimal length of k-mer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5244389/
https://www.ncbi.nlm.nih.gov/pubmed/28102365
http://dx.doi.org/10.1038/srep40712
work_keys_str_mv AT zhangqian viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer
AT junseran viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer
AT leuzemichael viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer
AT usserydavid viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer
AT nookaewintawat viralphylogenomicsusinganalignmentfreemethodathreestepapproachtodetermineoptimallengthofkmer