Cargando…
Genetic Distance for a General Non-Stationary Markov Substitution Process
The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380038/ https://www.ncbi.nlm.nih.gov/pubmed/25503772 http://dx.doi.org/10.1093/sysbio/syu106 |
_version_ | 1782364279363600384 |
---|---|
author | Kaehler, Benjamin D. Yap, Von Bing Zhang, Rongli Huttley, Gavin A. |
author_facet | Kaehler, Benjamin D. Yap, Von Bing Zhang, Rongli Huttley, Gavin A. |
author_sort | Kaehler, Benjamin D. |
collection | PubMed |
description | The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model. |
format | Online Article Text |
id | pubmed-4380038 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-43800382015-04-15 Genetic Distance for a General Non-Stationary Markov Substitution Process Kaehler, Benjamin D. Yap, Von Bing Zhang, Rongli Huttley, Gavin A. Syst Biol Regular Articles The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model. Oxford University Press 2015-03 2014-12-09 /pmc/articles/PMC4380038/ /pubmed/25503772 http://dx.doi.org/10.1093/sysbio/syu106 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Regular Articles Kaehler, Benjamin D. Yap, Von Bing Zhang, Rongli Huttley, Gavin A. Genetic Distance for a General Non-Stationary Markov Substitution Process |
title | Genetic Distance for a General Non-Stationary Markov Substitution Process |
title_full | Genetic Distance for a General Non-Stationary Markov Substitution Process |
title_fullStr | Genetic Distance for a General Non-Stationary Markov Substitution Process |
title_full_unstemmed | Genetic Distance for a General Non-Stationary Markov Substitution Process |
title_short | Genetic Distance for a General Non-Stationary Markov Substitution Process |
title_sort | genetic distance for a general non-stationary markov substitution process |
topic | Regular Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380038/ https://www.ncbi.nlm.nih.gov/pubmed/25503772 http://dx.doi.org/10.1093/sysbio/syu106 |
work_keys_str_mv | AT kaehlerbenjamind geneticdistanceforageneralnonstationarymarkovsubstitutionprocess AT yapvonbing geneticdistanceforageneralnonstationarymarkovsubstitutionprocess AT zhangrongli geneticdistanceforageneralnonstationarymarkovsubstitutionprocess AT huttleygavina geneticdistanceforageneralnonstationarymarkovsubstitutionprocess |