Cargando…

Insertion and deletion correcting DNA barcodes based on watermarks

BACKGROUND: Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcode...

Descripción completa

Detalles Bibliográficos
Autores principales: Kracht, David, Schober, Steffen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339740/
https://www.ncbi.nlm.nih.gov/pubmed/25887410
http://dx.doi.org/10.1186/s12859-015-0482-7
_version_ 1782358910994219008
author Kracht, David
Schober, Steffen
author_facet Kracht, David
Schober, Steffen
author_sort Kracht, David
collection PubMed
description BACKGROUND: Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels. Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric. RESULTS: In the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications. CONCLUSION: Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0482-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4339740
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43397402015-02-26 Insertion and deletion correcting DNA barcodes based on watermarks Kracht, David Schober, Steffen BMC Bioinformatics Methodology Article BACKGROUND: Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels. Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric. RESULTS: In the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications. CONCLUSION: Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0482-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-18 /pmc/articles/PMC4339740/ /pubmed/25887410 http://dx.doi.org/10.1186/s12859-015-0482-7 Text en © Kracht and Schober; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Kracht, David
Schober, Steffen
Insertion and deletion correcting DNA barcodes based on watermarks
title Insertion and deletion correcting DNA barcodes based on watermarks
title_full Insertion and deletion correcting DNA barcodes based on watermarks
title_fullStr Insertion and deletion correcting DNA barcodes based on watermarks
title_full_unstemmed Insertion and deletion correcting DNA barcodes based on watermarks
title_short Insertion and deletion correcting DNA barcodes based on watermarks
title_sort insertion and deletion correcting dna barcodes based on watermarks
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339740/
https://www.ncbi.nlm.nih.gov/pubmed/25887410
http://dx.doi.org/10.1186/s12859-015-0482-7
work_keys_str_mv AT krachtdavid insertionanddeletioncorrectingdnabarcodesbasedonwatermarks
AT schobersteffen insertionanddeletioncorrectingdnabarcodesbasedonwatermarks