Cargando…

Complexity of the simplest species tree problem

The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly un...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Tianqi, Yang, Ziheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382899/
https://www.ncbi.nlm.nih.gov/pubmed/33492385
http://dx.doi.org/10.1093/molbev/msab009
_version_ 1783741628186361856
author Zhu, Tianqi
Yang, Ziheng
author_facet Zhu, Tianqi
Yang, Ziheng
author_sort Zhu, Tianqi
collection PubMed
description The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
format Online
Article
Text
id pubmed-8382899
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83828992021-08-25 Complexity of the simplest species tree problem Zhu, Tianqi Yang, Ziheng Mol Biol Evol Methods The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation. Oxford University Press 2021-01-25 /pmc/articles/PMC8382899/ /pubmed/33492385 http://dx.doi.org/10.1093/molbev/msab009 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods
Zhu, Tianqi
Yang, Ziheng
Complexity of the simplest species tree problem
title Complexity of the simplest species tree problem
title_full Complexity of the simplest species tree problem
title_fullStr Complexity of the simplest species tree problem
title_full_unstemmed Complexity of the simplest species tree problem
title_short Complexity of the simplest species tree problem
title_sort complexity of the simplest species tree problem
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382899/
https://www.ncbi.nlm.nih.gov/pubmed/33492385
http://dx.doi.org/10.1093/molbev/msab009
work_keys_str_mv AT zhutianqi complexityofthesimplestspeciestreeproblem
AT yangziheng complexityofthesimplestspeciestreeproblem