Cargando…

STBase: One Million Species Trees for Comparative Biology

Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, m...

Descripción completa

Detalles Bibliográficos
Autores principales: McMahon, Michelle M., Deepak, Akshay, Fernández-Baca, David, Boss, Darren, Sanderson, Michael J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332655/
https://www.ncbi.nlm.nih.gov/pubmed/25679219
http://dx.doi.org/10.1371/journal.pone.0117987
_version_ 1782357935525986304
author McMahon, Michelle M.
Deepak, Akshay
Fernández-Baca, David
Boss, Darren
Sanderson, Michael J.
author_facet McMahon, Michelle M.
Deepak, Akshay
Fernández-Baca, David
Boss, Darren
Sanderson, Michael J.
author_sort McMahon, Michelle M.
collection PubMed
description Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
format Online
Article
Text
id pubmed-4332655
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43326552015-02-24 STBase: One Million Species Trees for Comparative Biology McMahon, Michelle M. Deepak, Akshay Fernández-Baca, David Boss, Darren Sanderson, Michael J. PLoS One Research Article Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. Public Library of Science 2015-02-13 /pmc/articles/PMC4332655/ /pubmed/25679219 http://dx.doi.org/10.1371/journal.pone.0117987 Text en © 2015 McMahon et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
McMahon, Michelle M.
Deepak, Akshay
Fernández-Baca, David
Boss, Darren
Sanderson, Michael J.
STBase: One Million Species Trees for Comparative Biology
title STBase: One Million Species Trees for Comparative Biology
title_full STBase: One Million Species Trees for Comparative Biology
title_fullStr STBase: One Million Species Trees for Comparative Biology
title_full_unstemmed STBase: One Million Species Trees for Comparative Biology
title_short STBase: One Million Species Trees for Comparative Biology
title_sort stbase: one million species trees for comparative biology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332655/
https://www.ncbi.nlm.nih.gov/pubmed/25679219
http://dx.doi.org/10.1371/journal.pone.0117987
work_keys_str_mv AT mcmahonmichellem stbaseonemillionspeciestreesforcomparativebiology
AT deepakakshay stbaseonemillionspeciestreesforcomparativebiology
AT fernandezbacadavid stbaseonemillionspeciestreesforcomparativebiology
AT bossdarren stbaseonemillionspeciestreesforcomparativebiology
AT sandersonmichaelj stbaseonemillionspeciestreesforcomparativebiology