Cargando…

Hierarchical sets: analyzing pangenome structure through scalable set visualizations

MOTIVATION: The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in ava...

Descripción completa

Detalles Bibliográficos
Autor principal: Pedersen, Thomas Lin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5447240/
https://www.ncbi.nlm.nih.gov/pubmed/28130242
http://dx.doi.org/10.1093/bioinformatics/btx034
_version_ 1783239289873629184
author Pedersen, Thomas Lin
author_facet Pedersen, Thomas Lin
author_sort Pedersen, Thomas Lin
collection PubMed
description MOTIVATION: The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis. RESULTS: We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. AVAILABILITY AND IMPLEMENTATION: The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN (https://cran.r-project.org/web/packages/hierarchicalSets) SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5447240
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54472402017-05-31 Hierarchical sets: analyzing pangenome structure through scalable set visualizations Pedersen, Thomas Lin Bioinformatics Original Papers MOTIVATION: The increase in available microbial genome sequences has resulted in an increase in the size of the pangenomes being analyzed. Current pangenome visualizations are not intended for the pangenome sizes possible today and new approaches are necessary in order to convert the increase in available information to increase in knowledge. As the pangenome data structure is essentially a collection of sets we explore the potential for scalable set visualization as a tool for pangenome analysis. RESULTS: We present a new hierarchical clustering algorithm based on set arithmetics that optimizes the intersection sizes along the branches. The intersection and union sizes along the hierarchy are visualized using a composite dendrogram and icicle plot, which, in pangenome context, shows the evolution of pangenome and core size along the evolutionary hierarchy. Outlying elements, i.e. elements whose presence pattern do not correspond with the hierarchy, can be visualized using hierarchical edge bundles. When applied to pangenome data this plot shows putative horizontal gene transfers between the genomes and can highlight relationships between genomes that is not represented by the hierarchy. We illustrate the utility of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. AVAILABILITY AND IMPLEMENTATION: The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN (https://cran.r-project.org/web/packages/hierarchicalSets) SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-06-01 2017-01-27 /pmc/articles/PMC5447240/ /pubmed/28130242 http://dx.doi.org/10.1093/bioinformatics/btx034 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Pedersen, Thomas Lin
Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title_full Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title_fullStr Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title_full_unstemmed Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title_short Hierarchical sets: analyzing pangenome structure through scalable set visualizations
title_sort hierarchical sets: analyzing pangenome structure through scalable set visualizations
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5447240/
https://www.ncbi.nlm.nih.gov/pubmed/28130242
http://dx.doi.org/10.1093/bioinformatics/btx034
work_keys_str_mv AT pedersenthomaslin hierarchicalsetsanalyzingpangenomestructurethroughscalablesetvisualizations