Cargando…

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search

BACKGROUND: Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM t...

Descripción completa

Detalles Bibliográficos
Autores principales: Beller, Timo, Ohlebusch, Enno
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4950428/
https://www.ncbi.nlm.nih.gov/pubmed/27437028
http://dx.doi.org/10.1186/s13015-016-0083-7
_version_ 1782443560774139904
author Beller, Timo
Ohlebusch, Enno
author_facet Beller, Timo
Ohlebusch, Enno
author_sort Beller, Timo
collection PubMed
description BACKGROUND: Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497–504, 2016) improved their result. RESULTS: In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele—a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.
format Online
Article
Text
id pubmed-4950428
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49504282016-07-20 A representation of a compressed de Bruijn graph for pan-genome analysis that enables search Beller, Timo Ohlebusch, Enno Algorithms Mol Biol Research BACKGROUND: Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497–504, 2016) improved their result. RESULTS: In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele—a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures. BioMed Central 2016-07-18 /pmc/articles/PMC4950428/ /pubmed/27437028 http://dx.doi.org/10.1186/s13015-016-0083-7 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Beller, Timo
Ohlebusch, Enno
A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title_full A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title_fullStr A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title_full_unstemmed A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title_short A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
title_sort representation of a compressed de bruijn graph for pan-genome analysis that enables search
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4950428/
https://www.ncbi.nlm.nih.gov/pubmed/27437028
http://dx.doi.org/10.1186/s13015-016-0083-7
work_keys_str_mv AT bellertimo arepresentationofacompresseddebruijngraphforpangenomeanalysisthatenablessearch
AT ohlebuschenno arepresentationofacompresseddebruijngraphforpangenomeanalysisthatenablessearch
AT bellertimo representationofacompresseddebruijngraphforpangenomeanalysisthatenablessearch
AT ohlebuschenno representationofacompresseddebruijngraphforpangenomeanalysisthatenablessearch