Cargando…
CHOP: haplotype-aware path indexing in population graphs
The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosio...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066762/ https://www.ncbi.nlm.nih.gov/pubmed/32160922 http://dx.doi.org/10.1186/s13059-020-01963-y |
_version_ | 1783505305181618176 |
---|---|
author | Mokveld, Tom Linthorst, Jasper Al-Ars, Zaid Holstege, Henne Reinders, Marcel |
author_facet | Mokveld, Tom Linthorst, Jasper Al-Ars, Zaid Holstege, Henne Reinders, Marcel |
author_sort | Mokveld, Tom |
collection | PubMed |
description | The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project. |
format | Online Article Text |
id | pubmed-7066762 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70667622020-03-18 CHOP: haplotype-aware path indexing in population graphs Mokveld, Tom Linthorst, Jasper Al-Ars, Zaid Holstege, Henne Reinders, Marcel Genome Biol Method The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of increasing pattern length and encoded variations inevitably leads to a combinatorial explosion of the search space. Instead of heuristic filtering or pruning steps to reduce the complexity, we propose CHOP, a method that constrains the search space by exploiting haplotype information, bounding the search space to the number of haplotypes so that a combinatorial explosion is prevented. We show that CHOP can be applied to large and complex datasets, by applying it on a graph-based representation of the human genome encoding all 80 million variants reported by the 1000 Genomes Project. BioMed Central 2020-03-11 /pmc/articles/PMC7066762/ /pubmed/32160922 http://dx.doi.org/10.1186/s13059-020-01963-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Method Mokveld, Tom Linthorst, Jasper Al-Ars, Zaid Holstege, Henne Reinders, Marcel CHOP: haplotype-aware path indexing in population graphs |
title | CHOP: haplotype-aware path indexing in population graphs |
title_full | CHOP: haplotype-aware path indexing in population graphs |
title_fullStr | CHOP: haplotype-aware path indexing in population graphs |
title_full_unstemmed | CHOP: haplotype-aware path indexing in population graphs |
title_short | CHOP: haplotype-aware path indexing in population graphs |
title_sort | chop: haplotype-aware path indexing in population graphs |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066762/ https://www.ncbi.nlm.nih.gov/pubmed/32160922 http://dx.doi.org/10.1186/s13059-020-01963-y |
work_keys_str_mv | AT mokveldtom chophaplotypeawarepathindexinginpopulationgraphs AT linthorstjasper chophaplotypeawarepathindexinginpopulationgraphs AT alarszaid chophaplotypeawarepathindexinginpopulationgraphs AT holstegehenne chophaplotypeawarepathindexinginpopulationgraphs AT reindersmarcel chophaplotypeawarepathindexinginpopulationgraphs |