Cargando…
A graph extension of the positional Burrows–Wheeler transform and its applications
We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026/ https://www.ncbi.nlm.nih.gov/pubmed/28702075 http://dx.doi.org/10.1186/s13015-017-0109-9 |
_version_ | 1783249400948064256 |
---|---|
author | Novak, Adam M. Garrison, Erik Paten, Benedict |
author_facet | Novak, Adam M. Garrison, Erik Paten, Benedict |
author_sort | Novak, Adam M. |
collection | PubMed |
description | We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node. |
format | Online Article Text |
id | pubmed-5505026 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55050262017-07-12 A graph extension of the positional Burrows–Wheeler transform and its applications Novak, Adam M. Garrison, Erik Paten, Benedict Algorithms Mol Biol Research We present a generalization of the positional Burrows–Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node. BioMed Central 2017-07-11 /pmc/articles/PMC5505026/ /pubmed/28702075 http://dx.doi.org/10.1186/s13015-017-0109-9 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Novak, Adam M. Garrison, Erik Paten, Benedict A graph extension of the positional Burrows–Wheeler transform and its applications |
title | A graph extension of the positional Burrows–Wheeler transform and its applications |
title_full | A graph extension of the positional Burrows–Wheeler transform and its applications |
title_fullStr | A graph extension of the positional Burrows–Wheeler transform and its applications |
title_full_unstemmed | A graph extension of the positional Burrows–Wheeler transform and its applications |
title_short | A graph extension of the positional Burrows–Wheeler transform and its applications |
title_sort | graph extension of the positional burrows–wheeler transform and its applications |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505026/ https://www.ncbi.nlm.nih.gov/pubmed/28702075 http://dx.doi.org/10.1186/s13015-017-0109-9 |
work_keys_str_mv | AT novakadamm agraphextensionofthepositionalburrowswheelertransformanditsapplications AT garrisonerik agraphextensionofthepositionalburrowswheelertransformanditsapplications AT patenbenedict agraphextensionofthepositionalburrowswheelertransformanditsapplications AT novakadamm graphextensionofthepositionalburrowswheelertransformanditsapplications AT garrisonerik graphextensionofthepositionalburrowswheelertransformanditsapplications AT patenbenedict graphextensionofthepositionalburrowswheelertransformanditsapplications |