Cargando…

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale data sets with many gene loci. In order to improve both model realism and to best extract information about relative divergence times in the sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Angelis, Konstantinos, Álvarez-Carretero, Sandra, Dos Reis, Mario, Yang, Ziheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5790132/
https://www.ncbi.nlm.nih.gov/pubmed/29029343
http://dx.doi.org/10.1093/sysbio/syx061
_version_ 1783296406726901760
author Angelis, Konstantinos
Álvarez-Carretero, Sandra
Dos Reis, Mario
Yang, Ziheng
author_facet Angelis, Konstantinos
Álvarez-Carretero, Sandra
Dos Reis, Mario
Yang, Ziheng
author_sort Angelis, Konstantinos
collection PubMed
description The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale data sets with many gene loci. In order to improve both model realism and to best extract information about relative divergence times in the sequence data, it is important to account for the heterogeneity in the evolutionary process across genes or genomic regions. Partitioning is a commonly used approach to achieve those goals. We group sites that have similar evolutionary characteristics into the same partition and those with different characteristics into different partitions, and then use different models or different values of model parameters for different partitions to account for the among-partition heterogeneity. However, how to partition data in practical phylogenetic analysis, and in particular in relaxed-clock dating analysis, is more art than science. Here, we use computer simulation and real data analysis to study the impact of the partition scheme on divergence time estimation. The partition schemes had relatively minor effects on the accuracy of posterior time estimates when the prior assumptions were correct and the clock was not seriously violated, but showed large differences when the clock was seriously violated, when the fossil calibrations were in conflict or incorrect, or when the rate prior was mis-specified. Concatenation produced the widest posterior intervals with the least precision. Use of many partitions increased the precision, as predicted by the infinite-sites theory, but the posterior intervals might fail to include the true ages because of the conflicting fossil calibrations or mis-specified rate priors. We analyzed a data set of 78 plastid genes from 15 plant species with serious clock violation and showed that time estimates differed significantly among partition schemes, irrespective of the rate drift model used. Multiple and precise fossil calibrations reduced the differences among partition schemes and were important to improving the precision of divergence time estimates. While the use of many partitions is an important approach to reducing the uncertainty in posterior time estimates, we do not recommend its general use for the present, given the limitations of current models of rate drift for partitioned data and the challenges of interpreting the fossil evidence to construct accurate and informative calibrations.
format Online
Article
Text
id pubmed-5790132
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57901322018-02-05 An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times Angelis, Konstantinos Álvarez-Carretero, Sandra Dos Reis, Mario Yang, Ziheng Syst Biol Regular Articles The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale data sets with many gene loci. In order to improve both model realism and to best extract information about relative divergence times in the sequence data, it is important to account for the heterogeneity in the evolutionary process across genes or genomic regions. Partitioning is a commonly used approach to achieve those goals. We group sites that have similar evolutionary characteristics into the same partition and those with different characteristics into different partitions, and then use different models or different values of model parameters for different partitions to account for the among-partition heterogeneity. However, how to partition data in practical phylogenetic analysis, and in particular in relaxed-clock dating analysis, is more art than science. Here, we use computer simulation and real data analysis to study the impact of the partition scheme on divergence time estimation. The partition schemes had relatively minor effects on the accuracy of posterior time estimates when the prior assumptions were correct and the clock was not seriously violated, but showed large differences when the clock was seriously violated, when the fossil calibrations were in conflict or incorrect, or when the rate prior was mis-specified. Concatenation produced the widest posterior intervals with the least precision. Use of many partitions increased the precision, as predicted by the infinite-sites theory, but the posterior intervals might fail to include the true ages because of the conflicting fossil calibrations or mis-specified rate priors. We analyzed a data set of 78 plastid genes from 15 plant species with serious clock violation and showed that time estimates differed significantly among partition schemes, irrespective of the rate drift model used. Multiple and precise fossil calibrations reduced the differences among partition schemes and were important to improving the precision of divergence time estimates. While the use of many partitions is an important approach to reducing the uncertainty in posterior time estimates, we do not recommend its general use for the present, given the limitations of current models of rate drift for partitioned data and the challenges of interpreting the fossil evidence to construct accurate and informative calibrations. Oxford University Press 2018-01 2017-07-04 /pmc/articles/PMC5790132/ /pubmed/29029343 http://dx.doi.org/10.1093/sysbio/syx061 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of the Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For Permissions, please email: journals.permissions@oup.com
spellingShingle Regular Articles
Angelis, Konstantinos
Álvarez-Carretero, Sandra
Dos Reis, Mario
Yang, Ziheng
An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title_full An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title_fullStr An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title_full_unstemmed An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title_short An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times
title_sort evaluation of different partitioning strategies for bayesian estimation of species divergence times
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5790132/
https://www.ncbi.nlm.nih.gov/pubmed/29029343
http://dx.doi.org/10.1093/sysbio/syx061
work_keys_str_mv AT angeliskonstantinos anevaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT alvarezcarreterosandra anevaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT dosreismario anevaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT yangziheng anevaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT angeliskonstantinos evaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT alvarezcarreterosandra evaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT dosreismario evaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes
AT yangziheng evaluationofdifferentpartitioningstrategiesforbayesianestimationofspeciesdivergencetimes