Cargando…

PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R

BACKGROUND: Analyses of phylogenetic informativeness represent an important step in screening potential or existing datasets for their proclivity toward convergent or parallel evolution of molecular sites. However, while new theory has been developed from which to predict the utility of sequence dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Dornburg, Alex, Fisk, J. Nick, Tamagnan, Jules, Townsend, Jeffrey P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5134231/
https://www.ncbi.nlm.nih.gov/pubmed/27905871
http://dx.doi.org/10.1186/s12862-016-0837-3
_version_ 1782471426618425344
author Dornburg, Alex
Fisk, J. Nick
Tamagnan, Jules
Townsend, Jeffrey P.
author_facet Dornburg, Alex
Fisk, J. Nick
Tamagnan, Jules
Townsend, Jeffrey P.
author_sort Dornburg, Alex
collection PubMed
description BACKGROUND: Analyses of phylogenetic informativeness represent an important step in screening potential or existing datasets for their proclivity toward convergent or parallel evolution of molecular sites. However, while new theory has been developed from which to predict the utility of sequence data, adoption of these advances have been stymied by a lack of software enabling application of advances in theory, especially for large next-generation sequence data sets. Moreover, there are no theoretical barriers to application of the phylogenetic informativeness or the calculation of quartet internode resolution probabilities in a Bayesian setting that more robustly accounts for uncertainty, yet there is no software with which a computationally intensive Bayesian approach to experimental design could be implemented. RESULTS: We introduce PhyInformR, an open source software package that performs rapid calculation of phylogenetic information content using the latest advances in phylogenetic informativeness based theory. These advances include modifications that incorporate uneven branch lengths and any model of nucleotide substitution to provide assessments of the phylogenetic utility of any given dataset or dataset partition. PhyInformR provides new tools for data visualization and routines optimized for rapid statistical calculations, including approaches making use of Bayesian posterior distributions and parallel processing. By implementing the computation on user hardware, PhyInformR increases the potential power users can apply toward screening datasets for phylogenetic/genomic information content by orders of magnitude. CONCLUSIONS: PhyInformR provides a means to implement diverse substitution models and specify uneven branch lengths for phylogenetic informativeness or calculations providing quartet based probabilities of resolution, produce novel visualizations, and facilitate analyses of next-generation sequence datasets while incorporating phylogenetic uncertainty through the use parallel processing. As an open source program, PhyInformR is fully customizable and expandable, thereby allowing for advanced methodologies to be readily integrated into local bioinformatics pipelines. Software is available through CRAN and a package containing the software, a detailed manual, and additional sample data is also provided freely through github: https://github.com/carolinafishes/PhyInformR. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0837-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5134231
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51342312016-12-15 PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R Dornburg, Alex Fisk, J. Nick Tamagnan, Jules Townsend, Jeffrey P. BMC Evol Biol Software BACKGROUND: Analyses of phylogenetic informativeness represent an important step in screening potential or existing datasets for their proclivity toward convergent or parallel evolution of molecular sites. However, while new theory has been developed from which to predict the utility of sequence data, adoption of these advances have been stymied by a lack of software enabling application of advances in theory, especially for large next-generation sequence data sets. Moreover, there are no theoretical barriers to application of the phylogenetic informativeness or the calculation of quartet internode resolution probabilities in a Bayesian setting that more robustly accounts for uncertainty, yet there is no software with which a computationally intensive Bayesian approach to experimental design could be implemented. RESULTS: We introduce PhyInformR, an open source software package that performs rapid calculation of phylogenetic information content using the latest advances in phylogenetic informativeness based theory. These advances include modifications that incorporate uneven branch lengths and any model of nucleotide substitution to provide assessments of the phylogenetic utility of any given dataset or dataset partition. PhyInformR provides new tools for data visualization and routines optimized for rapid statistical calculations, including approaches making use of Bayesian posterior distributions and parallel processing. By implementing the computation on user hardware, PhyInformR increases the potential power users can apply toward screening datasets for phylogenetic/genomic information content by orders of magnitude. CONCLUSIONS: PhyInformR provides a means to implement diverse substitution models and specify uneven branch lengths for phylogenetic informativeness or calculations providing quartet based probabilities of resolution, produce novel visualizations, and facilitate analyses of next-generation sequence datasets while incorporating phylogenetic uncertainty through the use parallel processing. As an open source program, PhyInformR is fully customizable and expandable, thereby allowing for advanced methodologies to be readily integrated into local bioinformatics pipelines. Software is available through CRAN and a package containing the software, a detailed manual, and additional sample data is also provided freely through github: https://github.com/carolinafishes/PhyInformR. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0837-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-01 /pmc/articles/PMC5134231/ /pubmed/27905871 http://dx.doi.org/10.1186/s12862-016-0837-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Dornburg, Alex
Fisk, J. Nick
Tamagnan, Jules
Townsend, Jeffrey P.
PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title_full PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title_fullStr PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title_full_unstemmed PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title_short PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R
title_sort phyinformr: phylogenetic experimental design and phylogenomic data exploration in r
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5134231/
https://www.ncbi.nlm.nih.gov/pubmed/27905871
http://dx.doi.org/10.1186/s12862-016-0837-3
work_keys_str_mv AT dornburgalex phyinformrphylogeneticexperimentaldesignandphylogenomicdataexplorationinr
AT fiskjnick phyinformrphylogeneticexperimentaldesignandphylogenomicdataexplorationinr
AT tamagnanjules phyinformrphylogeneticexperimentaldesignandphylogenomicdataexplorationinr
AT townsendjeffreyp phyinformrphylogeneticexperimentaldesignandphylogenomicdataexplorationinr