Cargando…
Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci
Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In par...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380039/ https://www.ncbi.nlm.nih.gov/pubmed/25503979 http://dx.doi.org/10.1093/sysbio/syu109 |
_version_ | 1782364279585898496 |
---|---|
author | Zhu, Tianqi Dos Reis, Mario Yang, Ziheng |
author_facet | Zhu, Tianqi Dos Reis, Mario Yang, Ziheng |
author_sort | Zhu, Tianqi |
collection | PubMed |
description | Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/ [Formula: see text] , and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. |
format | Online Article Text |
id | pubmed-4380039 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-43800392015-04-15 Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci Zhu, Tianqi Dos Reis, Mario Yang, Ziheng Syst Biol Regular Articles Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/ [Formula: see text] , and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. Oxford University Press 2015-03 2014-12-11 /pmc/articles/PMC4380039/ /pubmed/25503979 http://dx.doi.org/10.1093/sysbio/syu109 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Regular Articles Zhu, Tianqi Dos Reis, Mario Yang, Ziheng Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title | Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title_full | Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title_fullStr | Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title_full_unstemmed | Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title_short | Characterization of the Uncertainty of Divergence Time Estimation under Relaxed Molecular Clock Models Using Multiple Loci |
title_sort | characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci |
topic | Regular Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4380039/ https://www.ncbi.nlm.nih.gov/pubmed/25503979 http://dx.doi.org/10.1093/sysbio/syu109 |
work_keys_str_mv | AT zhutianqi characterizationoftheuncertaintyofdivergencetimeestimationunderrelaxedmolecularclockmodelsusingmultipleloci AT dosreismario characterizationoftheuncertaintyofdivergencetimeestimationunderrelaxedmolecularclockmodelsusingmultipleloci AT yangziheng characterizationoftheuncertaintyofdivergencetimeestimationunderrelaxedmolecularclockmodelsusingmultipleloci |