Cargando…

Haplotype-aware graph indexes

MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which pat...

Descripción completa

Detalles Bibliográficos
Autores principales: Sirén, Jouni, Garrison, Erik, Novak, Adam M, Paten, Benedict, Durbin, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7223266/
https://www.ncbi.nlm.nih.gov/pubmed/31406990
http://dx.doi.org/10.1093/bioinformatics/btz575
_version_ 1783533726231166976
author Sirén, Jouni
Garrison, Erik
Novak, Adam M
Paten, Benedict
Durbin, Richard
author_facet Sirén, Jouni
Garrison, Erik
Novak, Adam M
Paten, Benedict
Durbin, Richard
author_sort Sirén, Jouni
collection PubMed
description MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. AVAILABILITY AND IMPLEMENTATION: Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7223266
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72232662020-05-19 Haplotype-aware graph indexes Sirén, Jouni Garrison, Erik Novak, Adam M Paten, Benedict Durbin, Richard Bioinformatics Original Papers MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. AVAILABILITY AND IMPLEMENTATION: Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07-26 /pmc/articles/PMC7223266/ /pubmed/31406990 http://dx.doi.org/10.1093/bioinformatics/btz575 Text en © The Author(s) 2019. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Sirén, Jouni
Garrison, Erik
Novak, Adam M
Paten, Benedict
Durbin, Richard
Haplotype-aware graph indexes
title Haplotype-aware graph indexes
title_full Haplotype-aware graph indexes
title_fullStr Haplotype-aware graph indexes
title_full_unstemmed Haplotype-aware graph indexes
title_short Haplotype-aware graph indexes
title_sort haplotype-aware graph indexes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7223266/
https://www.ncbi.nlm.nih.gov/pubmed/31406990
http://dx.doi.org/10.1093/bioinformatics/btz575
work_keys_str_mv AT sirenjouni haplotypeawaregraphindexes
AT garrisonerik haplotypeawaregraphindexes
AT novakadamm haplotypeawaregraphindexes
AT patenbenedict haplotypeawaregraphindexes
AT durbinrichard haplotypeawaregraphindexes