Cargando…

PanGraph: scalable bacterial pan-genome graph construction

The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based a...

Descripción completa

Detalles Bibliográficos
Autores principales: Noll, Nicholas, Molari, Marco, Shaw, Liam P., Neher, Richard A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327495/
https://www.ncbi.nlm.nih.gov/pubmed/37278719
http://dx.doi.org/10.1099/mgen.0.001034
_version_ 1785069638595379200
author Noll, Nicholas
Molari, Marco
Shaw, Liam P.
Neher, Richard A.
author_facet Noll, Nicholas
Molari, Marco
Shaw, Liam P.
Neher, Richard A.
author_sort Noll, Nicholas
collection PubMed
description The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization.
format Online
Article
Text
id pubmed-10327495
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-103274952023-07-08 PanGraph: scalable bacterial pan-genome graph construction Noll, Nicholas Molari, Marco Shaw, Liam P. Neher, Richard A. Microb Genom Research Articles The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization. Microbiology Society 2023-06-06 /pmc/articles/PMC10327495/ /pubmed/37278719 http://dx.doi.org/10.1099/mgen.0.001034 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Research Articles
Noll, Nicholas
Molari, Marco
Shaw, Liam P.
Neher, Richard A.
PanGraph: scalable bacterial pan-genome graph construction
title PanGraph: scalable bacterial pan-genome graph construction
title_full PanGraph: scalable bacterial pan-genome graph construction
title_fullStr PanGraph: scalable bacterial pan-genome graph construction
title_full_unstemmed PanGraph: scalable bacterial pan-genome graph construction
title_short PanGraph: scalable bacterial pan-genome graph construction
title_sort pangraph: scalable bacterial pan-genome graph construction
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327495/
https://www.ncbi.nlm.nih.gov/pubmed/37278719
http://dx.doi.org/10.1099/mgen.0.001034
work_keys_str_mv AT nollnicholas pangraphscalablebacterialpangenomegraphconstruction
AT molarimarco pangraphscalablebacterialpangenomegraphconstruction
AT shawliamp pangraphscalablebacterialpangenomegraphconstruction
AT neherricharda pangraphscalablebacterialpangenomegraphconstruction