Cargando…

A graph extension of the positional Burrows–Wheeler transform and its applications

We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a...

Descripción completa

Detalles Bibliográficos
Autores principales: Novak, Adam M., Garrison, Erik, Paten, Benedict
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026/
https://www.ncbi.nlm.nih.gov/pubmed/28702075
http://dx.doi.org/10.1186/s13015-017-0109-9
_version_ 1783249400948064256
author Novak, Adam M.
Garrison, Erik
Paten, Benedict
author_facet Novak, Adam M.
Garrison, Erik
Paten, Benedict
author_sort Novak, Adam M.
collection PubMed
description We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node.
format Online
Article
Text
id pubmed-5505026
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55050262017-07-12 A graph extension of the positional Burrows–Wheeler transform and its applications Novak, Adam M. Garrison, Erik Paten, Benedict Algorithms Mol Biol Research We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node. BioMed Central 2017-07-11 /pmc/articles/PMC5505026/ /pubmed/28702075 http://dx.doi.org/10.1186/s13015-017-0109-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Novak, Adam M.
Garrison, Erik
Paten, Benedict
A graph extension of the positional Burrows–Wheeler transform and its applications
title A graph extension of the positional Burrows–Wheeler transform and its applications
title_full A graph extension of the positional Burrows–Wheeler transform and its applications
title_fullStr A graph extension of the positional Burrows–Wheeler transform and its applications
title_full_unstemmed A graph extension of the positional Burrows–Wheeler transform and its applications
title_short A graph extension of the positional Burrows–Wheeler transform and its applications
title_sort graph extension of the positional burrows–wheeler transform and its applications
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026/
https://www.ncbi.nlm.nih.gov/pubmed/28702075
http://dx.doi.org/10.1186/s13015-017-0109-9
work_keys_str_mv AT novakadamm agraphextensionofthepositionalburrowswheelertransformanditsapplications
AT garrisonerik agraphextensionofthepositionalburrowswheelertransformanditsapplications
AT patenbenedict agraphextensionofthepositionalburrowswheelertransformanditsapplications
AT novakadamm graphextensionofthepositionalburrowswheelertransformanditsapplications
AT garrisonerik graphextensionofthepositionalburrowswheelertransformanditsapplications
AT patenbenedict graphextensionofthepositionalburrowswheelertransformanditsapplications