Cargando…

CHOP: haplotype-aware path indexing in population graphs

The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosio...

Descripción completa

Detalles Bibliográficos
Autores principales: Mokveld, Tom, Linthorst, Jasper, Al-Ars, Zaid, Holstege, Henne, Reinders, Marcel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066762/
https://www.ncbi.nlm.nih.gov/pubmed/32160922
http://dx.doi.org/10.1186/s13059-020-01963-y
_version_ 1783505305181618176
author Mokveld, Tom
Linthorst, Jasper
Al-Ars, Zaid
Holstege, Henne
Reinders, Marcel
author_facet Mokveld, Tom
Linthorst, Jasper
Al-Ars, Zaid
Holstege, Henne
Reinders, Marcel
author_sort Mokveld, Tom
collection PubMed
description The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project.
format Online
Article
Text
id pubmed-7066762
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70667622020-03-18 CHOP: haplotype-aware path indexing in population graphs Mokveld, Tom Linthorst, Jasper Al-Ars, Zaid Holstege, Henne Reinders, Marcel Genome Biol Method The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project. BioMed Central 2020-03-11 /pmc/articles/PMC7066762/ /pubmed/32160922 http://dx.doi.org/10.1186/s13059-020-01963-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Method
Mokveld, Tom
Linthorst, Jasper
Al-Ars, Zaid
Holstege, Henne
Reinders, Marcel
CHOP: haplotype-aware path indexing in population graphs
title CHOP: haplotype-aware path indexing in population graphs
title_full CHOP: haplotype-aware path indexing in population graphs
title_fullStr CHOP: haplotype-aware path indexing in population graphs
title_full_unstemmed CHOP: haplotype-aware path indexing in population graphs
title_short CHOP: haplotype-aware path indexing in population graphs
title_sort chop: haplotype-aware path indexing in population graphs
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066762/
https://www.ncbi.nlm.nih.gov/pubmed/32160922
http://dx.doi.org/10.1186/s13059-020-01963-y
work_keys_str_mv AT mokveldtom chophaplotypeawarepathindexinginpopulationgraphs
AT linthorstjasper chophaplotypeawarepathindexinginpopulationgraphs
AT alarszaid chophaplotypeawarepathindexinginpopulationgraphs
AT holstegehenne chophaplotypeawarepathindexinginpopulationgraphs
AT reindersmarcel chophaplotypeawarepathindexinginpopulationgraphs