Cargando…

Kohdista: an efficient method to index and query possible Rmap alignments

BACKGROUND: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical...

Descripción completa

Detalles Bibliográficos
Autores principales: Muggli, Martin D., Puglisi, Simon J., Boucher, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907254/
https://www.ncbi.nlm.nih.gov/pubmed/31867049
http://dx.doi.org/10.1186/s13015-019-0160-9
_version_ 1783478514134024192
author Muggli, Martin D.
Puglisi, Simon J.
Boucher, Christina
author_facet Muggli, Martin D.
Puglisi, Simon J.
Boucher, Christina
author_sort Muggli, Martin D.
collection PubMed
description BACKGROUND: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. RESULTS: We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. CONCLUSION: we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time.
format Online
Article
Text
id pubmed-6907254
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69072542019-12-20 Kohdista: an efficient method to index and query possible Rmap alignments Muggli, Martin D. Puglisi, Simon J. Boucher, Christina Algorithms Mol Biol Software Article BACKGROUND: Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. RESULTS: We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. CONCLUSION: we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. BioMed Central 2019-12-12 /pmc/articles/PMC6907254/ /pubmed/31867049 http://dx.doi.org/10.1186/s13015-019-0160-9 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software Article
Muggli, Martin D.
Puglisi, Simon J.
Boucher, Christina
Kohdista: an efficient method to index and query possible Rmap alignments
title Kohdista: an efficient method to index and query possible Rmap alignments
title_full Kohdista: an efficient method to index and query possible Rmap alignments
title_fullStr Kohdista: an efficient method to index and query possible Rmap alignments
title_full_unstemmed Kohdista: an efficient method to index and query possible Rmap alignments
title_short Kohdista: an efficient method to index and query possible Rmap alignments
title_sort kohdista: an efficient method to index and query possible rmap alignments
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907254/
https://www.ncbi.nlm.nih.gov/pubmed/31867049
http://dx.doi.org/10.1186/s13015-019-0160-9
work_keys_str_mv AT mugglimartind kohdistaanefficientmethodtoindexandquerypossiblermapalignments
AT puglisisimonj kohdistaanefficientmethodtoindexandquerypossiblermapalignments
AT boucherchristina kohdistaanefficientmethodtoindexandquerypossiblermapalignments