Cargando…

A maximum pseudo-likelihood approach for estimating species trees under the coalescent model

BACKGROUND: Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coale...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Liang, Yu, Lili, Edwards, Scott V
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2976751/
https://www.ncbi.nlm.nih.gov/pubmed/20937096
http://dx.doi.org/10.1186/1471-2148-10-302
_version_ 1782191008850640896
author Liu, Liang
Yu, Lili
Edwards, Scott V
author_facet Liu, Liang
Yu, Lili
Edwards, Scott V
author_sort Liu, Liang
collection PubMed
description BACKGROUND: Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units. RESULTS: We show that the MPE of the species tree is statistically consistent as the number M of genes goes to infinity. In addition, the probability that the MPE of the species tree matches the true species tree converges to 1 at rate O(M (-1)). The simulation results confirm that the maximum pseudo-likelihood approach is statistically consistent even when the species tree is in the anomaly zone. We applied our method, Maximum Pseudo-likelihood for Estimating Species Trees (MP-EST) to a mammal dataset. The four major clades found in the MP-EST tree are consistent with those in the Bayesian concatenation tree. The bootstrap supports for the species tree estimated by the MP-EST method are more reasonable than the posterior probability supports given by the Bayesian concatenation method in reflecting the level of uncertainty in gene trees and controversies over the relationship of four major groups of placental mammals. CONCLUSIONS: MP-EST can consistently estimate the topology and branch lengths (in coalescent units) of the species tree. Although the pseudo-likelihood is derived from coalescent theory, and assumes no gene flow or horizontal gene transfer (HGT), the MP-EST method is robust to a small amount of HGT in the dataset. In addition, increasing the number of genes does not increase the computational time substantially. The MP-EST method is fast for analyzing datasets that involve a large number of genes but a moderate number of species.
format Text
id pubmed-2976751
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29767512010-11-10 A maximum pseudo-likelihood approach for estimating species trees under the coalescent model Liu, Liang Yu, Lili Edwards, Scott V BMC Evol Biol Research Article BACKGROUND: Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE) of the species tree (topology, branch lengths, and population sizes) from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE) of species trees, with branch lengths of the species tree in coalescent units. RESULTS: We show that the MPE of the species tree is statistically consistent as the number M of genes goes to infinity. In addition, the probability that the MPE of the species tree matches the true species tree converges to 1 at rate O(M (-1)). The simulation results confirm that the maximum pseudo-likelihood approach is statistically consistent even when the species tree is in the anomaly zone. We applied our method, Maximum Pseudo-likelihood for Estimating Species Trees (MP-EST) to a mammal dataset. The four major clades found in the MP-EST tree are consistent with those in the Bayesian concatenation tree. The bootstrap supports for the species tree estimated by the MP-EST method are more reasonable than the posterior probability supports given by the Bayesian concatenation method in reflecting the level of uncertainty in gene trees and controversies over the relationship of four major groups of placental mammals. CONCLUSIONS: MP-EST can consistently estimate the topology and branch lengths (in coalescent units) of the species tree. Although the pseudo-likelihood is derived from coalescent theory, and assumes no gene flow or horizontal gene transfer (HGT), the MP-EST method is robust to a small amount of HGT in the dataset. In addition, increasing the number of genes does not increase the computational time substantially. The MP-EST method is fast for analyzing datasets that involve a large number of genes but a moderate number of species. BioMed Central 2010-10-11 /pmc/articles/PMC2976751/ /pubmed/20937096 http://dx.doi.org/10.1186/1471-2148-10-302 Text en Copyright ©2010 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Liang
Yu, Lili
Edwards, Scott V
A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title_full A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title_fullStr A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title_full_unstemmed A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title_short A maximum pseudo-likelihood approach for estimating species trees under the coalescent model
title_sort maximum pseudo-likelihood approach for estimating species trees under the coalescent model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2976751/
https://www.ncbi.nlm.nih.gov/pubmed/20937096
http://dx.doi.org/10.1186/1471-2148-10-302
work_keys_str_mv AT liuliang amaximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel
AT yulili amaximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel
AT edwardsscottv amaximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel
AT liuliang maximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel
AT yulili maximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel
AT edwardsscottv maximumpseudolikelihoodapproachforestimatingspeciestreesunderthecoalescentmodel