Cargando…

Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity

BACKGROUND: Large sequence datasets are difficult to visualize and handle. Additionally, they often do not represent a random subset of the natural diversity, but the result of uncoordinated and convenience sampling. Consequently, they can suffer from redundancy and sampling biases. RESULTS: Here we...

Descripción completa

Detalles Bibliográficos
Autores principales: Menardo, Fabrizio, Loiseau, Chloé, Brites, Daniela, Coscolla, Mireia, Gygli, Sebastian M., Rutaihwa, Liliana K., Trauner, Andrej, Beisel, Christian, Borrell, Sonia, Gagneux, Sebastien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5930393/
https://www.ncbi.nlm.nih.gov/pubmed/29716518
http://dx.doi.org/10.1186/s12859-018-2164-8
Descripción
Sumario:BACKGROUND: Large sequence datasets are difficult to visualize and handle. Additionally, they often do not represent a random subset of the natural diversity, but the result of uncoordinated and convenience sampling. Consequently, they can suffer from redundancy and sampling biases. RESULTS: Here we present Treemmer, a simple tool to evaluate the redundancy of phylogenetic trees and reduce their complexity by eliminating leaves that contribute the least to the tree diversity. CONCLUSIONS: Treemmer can reduce the size of datasets with different phylogenetic structures and levels of redundancy while maintaining a sub-sample that is representative of the original diversity. Additionally, it is possible to fine-tune the behavior of Treemmer including any kind of meta-information, making Treemmer particularly useful for empirical studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2164-8) contains supplementary material, which is available to authorized users.