Cargando…

Genetic Distance for a General Non-Stationary Markov Substitution Process

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaehler, Benjamin D., Yap, Von Bing, Zhang, Rongli, Huttley, Gavin A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380038/
https://www.ncbi.nlm.nih.gov/pubmed/25503772
http://dx.doi.org/10.1093/sysbio/syu106
_version_ 1782364279363600384
author Kaehler, Benjamin D.
Yap, Von Bing
Zhang, Rongli
Huttley, Gavin A.
author_facet Kaehler, Benjamin D.
Yap, Von Bing
Zhang, Rongli
Huttley, Gavin A.
author_sort Kaehler, Benjamin D.
collection PubMed
description The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.
format Online
Article
Text
id pubmed-4380038
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43800382015-04-15 Genetic Distance for a General Non-Stationary Markov Substitution Process Kaehler, Benjamin D. Yap, Von Bing Zhang, Rongli Huttley, Gavin A. Syst Biol Regular Articles The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model. Oxford University Press 2015-03 2014-12-09 /pmc/articles/PMC4380038/ /pubmed/25503772 http://dx.doi.org/10.1093/sysbio/syu106 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Kaehler, Benjamin D.
Yap, Von Bing
Zhang, Rongli
Huttley, Gavin A.
Genetic Distance for a General Non-Stationary Markov Substitution Process
title Genetic Distance for a General Non-Stationary Markov Substitution Process
title_full Genetic Distance for a General Non-Stationary Markov Substitution Process
title_fullStr Genetic Distance for a General Non-Stationary Markov Substitution Process
title_full_unstemmed Genetic Distance for a General Non-Stationary Markov Substitution Process
title_short Genetic Distance for a General Non-Stationary Markov Substitution Process
title_sort genetic distance for a general non-stationary markov substitution process
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380038/
https://www.ncbi.nlm.nih.gov/pubmed/25503772
http://dx.doi.org/10.1093/sysbio/syu106
work_keys_str_mv AT kaehlerbenjamind geneticdistanceforageneralnonstationarymarkovsubstitutionprocess
AT yapvonbing geneticdistanceforageneralnonstationarymarkovsubstitutionprocess
AT zhangrongli geneticdistanceforageneralnonstationarymarkovsubstitutionprocess
AT huttleygavina geneticdistanceforageneralnonstationarymarkovsubstitutionprocess