Cargando…
Haplotype-aware graph indexes
MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which pat...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7223266/ https://www.ncbi.nlm.nih.gov/pubmed/31406990 http://dx.doi.org/10.1093/bioinformatics/btz575 |
_version_ | 1783533726231166976 |
---|---|
author | Sirén, Jouni Garrison, Erik Novak, Adam M Paten, Benedict Durbin, Richard |
author_facet | Sirén, Jouni Garrison, Erik Novak, Adam M Paten, Benedict Durbin, Richard |
author_sort | Sirén, Jouni |
collection | PubMed |
description | MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. AVAILABILITY AND IMPLEMENTATION: Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7223266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72232662020-05-19 Haplotype-aware graph indexes Sirén, Jouni Garrison, Erik Novak, Adam M Paten, Benedict Durbin, Richard Bioinformatics Original Papers MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. AVAILABILITY AND IMPLEMENTATION: Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07-26 /pmc/articles/PMC7223266/ /pubmed/31406990 http://dx.doi.org/10.1093/bioinformatics/btz575 Text en © The Author(s) 2019. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Sirén, Jouni Garrison, Erik Novak, Adam M Paten, Benedict Durbin, Richard Haplotype-aware graph indexes |
title | Haplotype-aware graph indexes |
title_full | Haplotype-aware graph indexes |
title_fullStr | Haplotype-aware graph indexes |
title_full_unstemmed | Haplotype-aware graph indexes |
title_short | Haplotype-aware graph indexes |
title_sort | haplotype-aware graph indexes |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7223266/ https://www.ncbi.nlm.nih.gov/pubmed/31406990 http://dx.doi.org/10.1093/bioinformatics/btz575 |
work_keys_str_mv | AT sirenjouni haplotypeawaregraphindexes AT garrisonerik haplotypeawaregraphindexes AT novakadamm haplotypeawaregraphindexes AT patenbenedict haplotypeawaregraphindexes AT durbinrichard haplotypeawaregraphindexes |