Cargando…

Viral taxonomy derived from evolutionary genome relationships

We describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the vi...

Descripción completa

Detalles Bibliográficos
Autores principales: Dougan, Tyler J., Quake, Stephen R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693820/
https://www.ncbi.nlm.nih.gov/pubmed/31412051
http://dx.doi.org/10.1371/journal.pone.0220440
_version_ 1783443746298265600
author Dougan, Tyler J.
Quake, Stephen R.
author_facet Dougan, Tyler J.
Quake, Stephen R.
author_sort Dougan, Tyler J.
collection PubMed
description We describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the viruses are created from alignments using NCBI BLAST, and their match scores are combined to produce a metric between genomes, which is in turn used to determine a global classification using the 5,817 viruses on RefSeq. In cases where there is no measurable alignment between any genes, the method falls back to a coarser measure of genome relationship: the mutual information of 4-mer frequency. This results in a principled model which depends only on the genome sequence, which captures many interesting relationships between viral families, and which creates clusters which correlate well with both the Baltimore and ICTV classifications. The incremental computational cost of classifying a novel virus is low and therefore newly discovered viruses can be quickly identified and classified. The model goes beyond alignment-free classifications by producing a full phylogeny similar to those constructed by virologists using qualitative features, while relying only on objective genes. These results bolster the case for mathematical models in microbiology which can characterize organisms using only their genetic material and provide an independent check for phylogenies constructed by humans, considerably faster and more cheaply than less modern approaches.
format Online
Article
Text
id pubmed-6693820
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-66938202019-08-16 Viral taxonomy derived from evolutionary genome relationships Dougan, Tyler J. Quake, Stephen R. PLoS One Research Article We describe a new genome alignment-based model for understanding the diversity of viruses based on evolutionary genetic relationships. This approach uses information theory and a physical model to determine the information shared by the genes in two genomes. Pairwise comparisons of genes from the viruses are created from alignments using NCBI BLAST, and their match scores are combined to produce a metric between genomes, which is in turn used to determine a global classification using the 5,817 viruses on RefSeq. In cases where there is no measurable alignment between any genes, the method falls back to a coarser measure of genome relationship: the mutual information of 4-mer frequency. This results in a principled model which depends only on the genome sequence, which captures many interesting relationships between viral families, and which creates clusters which correlate well with both the Baltimore and ICTV classifications. The incremental computational cost of classifying a novel virus is low and therefore newly discovered viruses can be quickly identified and classified. The model goes beyond alignment-free classifications by producing a full phylogeny similar to those constructed by virologists using qualitative features, while relying only on objective genes. These results bolster the case for mathematical models in microbiology which can characterize organisms using only their genetic material and provide an independent check for phylogenies constructed by humans, considerably faster and more cheaply than less modern approaches. Public Library of Science 2019-08-14 /pmc/articles/PMC6693820/ /pubmed/31412051 http://dx.doi.org/10.1371/journal.pone.0220440 Text en © 2019 Dougan, Quake http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dougan, Tyler J.
Quake, Stephen R.
Viral taxonomy derived from evolutionary genome relationships
title Viral taxonomy derived from evolutionary genome relationships
title_full Viral taxonomy derived from evolutionary genome relationships
title_fullStr Viral taxonomy derived from evolutionary genome relationships
title_full_unstemmed Viral taxonomy derived from evolutionary genome relationships
title_short Viral taxonomy derived from evolutionary genome relationships
title_sort viral taxonomy derived from evolutionary genome relationships
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693820/
https://www.ncbi.nlm.nih.gov/pubmed/31412051
http://dx.doi.org/10.1371/journal.pone.0220440
work_keys_str_mv AT dougantylerj viraltaxonomyderivedfromevolutionarygenomerelationships
AT quakestephenr viraltaxonomyderivedfromevolutionarygenomerelationships