Cargando…

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transpare...

Descripción completa

Detalles Bibliográficos
Autores principales: Redelings, Benjamin D., Holder, Mark T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335690/
https://www.ncbi.nlm.nih.gov/pubmed/28265520
http://dx.doi.org/10.7717/peerj.3058
_version_ 1782512087189159936
author Redelings, Benjamin D.
Holder, Mark T.
author_facet Redelings, Benjamin D.
Holder, Mark T.
author_sort Redelings, Benjamin D.
collection PubMed
description We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.
format Online
Article
Text
id pubmed-5335690
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-53356902017-03-06 A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species Redelings, Benjamin D. Holder, Mark T. PeerJ Bioinformatics We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub. PeerJ Inc. 2017-03-01 /pmc/articles/PMC5335690/ /pubmed/28265520 http://dx.doi.org/10.7717/peerj.3058 Text en ©2017 Redelings and Holder http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Redelings, Benjamin D.
Holder, Mark T.
A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_full A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_fullStr A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_full_unstemmed A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_short A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
title_sort supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335690/
https://www.ncbi.nlm.nih.gov/pubmed/28265520
http://dx.doi.org/10.7717/peerj.3058
work_keys_str_mv AT redelingsbenjamind asupertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT holdermarkt asupertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT redelingsbenjamind supertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies
AT holdermarkt supertreepipelineforsummarizingphylogeneticandtaxonomicinformationformillionsofspecies