Cargando…

An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees

BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, w...

Descripción completa

Detalles Bibliográficos
Autores principales: Uricchio, Lawrence H., Warnow, Tandy, Rosenberg, Noah A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123308/
https://www.ncbi.nlm.nih.gov/pubmed/28185570
http://dx.doi.org/10.1186/s12859-016-1266-4
_version_ 1782469707475976192
author Uricchio, Lawrence H.
Warnow, Tandy
Rosenberg, Noah A.
author_facet Uricchio, Lawrence H.
Warnow, Tandy
Rosenberg, Noah A.
author_sort Uricchio, Lawrence H.
collection PubMed
description BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, we report a general analytical upper bound on the number of gene trees n required such that with probability q, each bipartition of a species tree is represented at least once in a set of n random gene trees. This bound employs a formula that is straightforward to compute, depends only on the minimum internal branch length of the species tree and the number of taxa, and applies irrespective of the species tree topology. Using simulations, we investigate numerical properties of the bound as well as its accuracy under the multispecies coalescent. CONCLUSIONS: Our results are helpful for conservatively bounding the number of gene trees required by the ASTRAL inference method, and the approach has potential to be extended to bound other properties of gene tree sets under the model.
format Online
Article
Text
id pubmed-5123308
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51233082016-12-06 An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees Uricchio, Lawrence H. Warnow, Tandy Rosenberg, Noah A. BMC Bioinformatics Research BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, we report a general analytical upper bound on the number of gene trees n required such that with probability q, each bipartition of a species tree is represented at least once in a set of n random gene trees. This bound employs a formula that is straightforward to compute, depends only on the minimum internal branch length of the species tree and the number of taxa, and applies irrespective of the species tree topology. Using simulations, we investigate numerical properties of the bound as well as its accuracy under the multispecies coalescent. CONCLUSIONS: Our results are helpful for conservatively bounding the number of gene trees required by the ASTRAL inference method, and the approach has potential to be extended to bound other properties of gene tree sets under the model. BioMed Central 2016-11-11 /pmc/articles/PMC5123308/ /pubmed/28185570 http://dx.doi.org/10.1186/s12859-016-1266-4 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Uricchio, Lawrence H.
Warnow, Tandy
Rosenberg, Noah A.
An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title_full An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title_fullStr An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title_full_unstemmed An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title_short An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
title_sort analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123308/
https://www.ncbi.nlm.nih.gov/pubmed/28185570
http://dx.doi.org/10.1186/s12859-016-1266-4
work_keys_str_mv AT uricchiolawrenceh ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees
AT warnowtandy ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees
AT rosenbergnoaha ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees
AT uricchiolawrenceh analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees
AT warnowtandy analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees
AT rosenbergnoaha analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees