Cargando…
An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees
BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, w...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123308/ https://www.ncbi.nlm.nih.gov/pubmed/28185570 http://dx.doi.org/10.1186/s12859-016-1266-4 |
_version_ | 1782469707475976192 |
---|---|
author | Uricchio, Lawrence H. Warnow, Tandy Rosenberg, Noah A. |
author_facet | Uricchio, Lawrence H. Warnow, Tandy Rosenberg, Noah A. |
author_sort | Uricchio, Lawrence H. |
collection | PubMed |
description | BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, we report a general analytical upper bound on the number of gene trees n required such that with probability q, each bipartition of a species tree is represented at least once in a set of n random gene trees. This bound employs a formula that is straightforward to compute, depends only on the minimum internal branch length of the species tree and the number of taxa, and applies irrespective of the species tree topology. Using simulations, we investigate numerical properties of the bound as well as its accuracy under the multispecies coalescent. CONCLUSIONS: Our results are helpful for conservatively bounding the number of gene trees required by the ASTRAL inference method, and the approach has potential to be extended to bound other properties of gene tree sets under the model. |
format | Online Article Text |
id | pubmed-5123308 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51233082016-12-06 An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees Uricchio, Lawrence H. Warnow, Tandy Rosenberg, Noah A. BMC Bioinformatics Research BACKGROUND: Many methods for species tree inference require data from a sufficiently large sample of genomic loci in order to produce accurate estimates. However, few studies have attempted to use analytical theory to quantify “sufficiently large”. RESULTS: Using the multispecies coalescent model, we report a general analytical upper bound on the number of gene trees n required such that with probability q, each bipartition of a species tree is represented at least once in a set of n random gene trees. This bound employs a formula that is straightforward to compute, depends only on the minimum internal branch length of the species tree and the number of taxa, and applies irrespective of the species tree topology. Using simulations, we investigate numerical properties of the bound as well as its accuracy under the multispecies coalescent. CONCLUSIONS: Our results are helpful for conservatively bounding the number of gene trees required by the ASTRAL inference method, and the approach has potential to be extended to bound other properties of gene tree sets under the model. BioMed Central 2016-11-11 /pmc/articles/PMC5123308/ /pubmed/28185570 http://dx.doi.org/10.1186/s12859-016-1266-4 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Uricchio, Lawrence H. Warnow, Tandy Rosenberg, Noah A. An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title | An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title_full | An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title_fullStr | An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title_full_unstemmed | An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title_short | An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
title_sort | analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123308/ https://www.ncbi.nlm.nih.gov/pubmed/28185570 http://dx.doi.org/10.1186/s12859-016-1266-4 |
work_keys_str_mv | AT uricchiolawrenceh ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees AT warnowtandy ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees AT rosenbergnoaha ananalyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees AT uricchiolawrenceh analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees AT warnowtandy analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees AT rosenbergnoaha analyticalupperboundonthenumberoflocirequiredforallsplitsofaspeciestreetoappearinasetofgenetrees |