Cargando…

Inference of species phylogenies from bi-allelic markers using pseudo-likelihood

MOTIVATION: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism da...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Jiafan, Nakhleh, Luay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022577/
https://www.ncbi.nlm.nih.gov/pubmed/29950004
http://dx.doi.org/10.1093/bioinformatics/bty295
_version_ 1783335707805220864
author Zhu, Jiafan
Nakhleh, Luay
author_facet Zhu, Jiafan
Nakhleh, Luay
author_sort Zhu, Jiafan
collection PubMed
description MOTIVATION: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method’s applicability. RESULTS: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss. AVAILABILITY AND IMPLEMENTATION: The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet).
format Online
Article
Text
id pubmed-6022577
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225772018-07-10 Inference of species phylogenies from bi-allelic markers using pseudo-likelihood Zhu, Jiafan Nakhleh, Luay Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method’s applicability. RESULTS: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss. AVAILABILITY AND IMPLEMENTATION: The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet). Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022577/ /pubmed/29950004 http://dx.doi.org/10.1093/bioinformatics/bty295 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Zhu, Jiafan
Nakhleh, Luay
Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title_full Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title_fullStr Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title_full_unstemmed Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title_short Inference of species phylogenies from bi-allelic markers using pseudo-likelihood
title_sort inference of species phylogenies from bi-allelic markers using pseudo-likelihood
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022577/
https://www.ncbi.nlm.nih.gov/pubmed/29950004
http://dx.doi.org/10.1093/bioinformatics/bty295
work_keys_str_mv AT zhujiafan inferenceofspeciesphylogeniesfrombiallelicmarkersusingpseudolikelihood
AT nakhlehluay inferenceofspeciesphylogeniesfrombiallelicmarkersusingpseudolikelihood