Cargando…

SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage

The amount of phylogenetically informative sequence data in GenBank is growing at an exponential rate, and large phylogenetic trees are increasingly used in research. Tools are needed to construct phylogenetic sequence matrices from GenBank data and evaluate the effect of missing data. Supermatrix C...

Descripción completa

Detalles Bibliográficos
Autor principal: Freyman, William A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666519/
https://www.ncbi.nlm.nih.gov/pubmed/26648681
http://dx.doi.org/10.4137/EBO.S35384
_version_ 1782403717030477824
author Freyman, William A.
author_facet Freyman, William A.
author_sort Freyman, William A.
collection PubMed
description The amount of phylogenetically informative sequence data in GenBank is growing at an exponential rate, and large phylogenetic trees are increasingly used in research. Tools are needed to construct phylogenetic sequence matrices from GenBank data and evaluate the effect of missing data. Supermatrix Constructor (SUMAC) is a tool to data-mine GenBank, construct phylogenetic supermatrices, and assess the phylogenetic decisiveness of a matrix given the pattern of missing sequence data. SUMAC calculates a novel metric, Missing Sequence Decisiveness Scores (MSDS), which measures how much each individual missing sequence contributes to the decisiveness of the matrix. MSDS can be used to compare supermatrices and prioritize the acquisition of new sequence data. SUMAC constructs supermatrices either through an exploratory clustering of all GenBank sequences within a taxonomic group or by using guide sequences to build homologous clusters in a more targeted manner. SUMAC assembles supermatrices for any taxonomic group recognized in GenBank and is optimized to run on multicore computer systems by parallelizing multiple stages of operation. SUMAC is implemented as a Python package that can run as a stand-alone command-line program, or its modules and objects can be incorporated within other programs. SUMAC is released under the open source GPLv3 license and is available at https://github.com/wf8/sumac.
format Online
Article
Text
id pubmed-4666519
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-46665192015-12-08 SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage Freyman, William A. Evol Bioinform Online Technical Advance The amount of phylogenetically informative sequence data in GenBank is growing at an exponential rate, and large phylogenetic trees are increasingly used in research. Tools are needed to construct phylogenetic sequence matrices from GenBank data and evaluate the effect of missing data. Supermatrix Constructor (SUMAC) is a tool to data-mine GenBank, construct phylogenetic supermatrices, and assess the phylogenetic decisiveness of a matrix given the pattern of missing sequence data. SUMAC calculates a novel metric, Missing Sequence Decisiveness Scores (MSDS), which measures how much each individual missing sequence contributes to the decisiveness of the matrix. MSDS can be used to compare supermatrices and prioritize the acquisition of new sequence data. SUMAC constructs supermatrices either through an exploratory clustering of all GenBank sequences within a taxonomic group or by using guide sequences to build homologous clusters in a more targeted manner. SUMAC assembles supermatrices for any taxonomic group recognized in GenBank and is optimized to run on multicore computer systems by parallelizing multiple stages of operation. SUMAC is implemented as a Python package that can run as a stand-alone command-line program, or its modules and objects can be incorporated within other programs. SUMAC is released under the open source GPLv3 license and is available at https://github.com/wf8/sumac. Libertas Academica 2015-11-30 /pmc/articles/PMC4666519/ /pubmed/26648681 http://dx.doi.org/10.4137/EBO.S35384 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Technical Advance
Freyman, William A.
SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title_full SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title_fullStr SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title_full_unstemmed SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title_short SUMAC: Constructing Phylogenetic Supermatrices and Assessing Partially Decisive Taxon Coverage
title_sort sumac: constructing phylogenetic supermatrices and assessing partially decisive taxon coverage
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666519/
https://www.ncbi.nlm.nih.gov/pubmed/26648681
http://dx.doi.org/10.4137/EBO.S35384
work_keys_str_mv AT freymanwilliama sumacconstructingphylogeneticsupermatricesandassessingpartiallydecisivetaxoncoverage