Cargando…

BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing

BACKGROUND: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to...

Descripción completa

Detalles Bibliográficos
Autores principales: Somervuo, Panu, Koskinen, Patrik, Mei, Peng, Holm, Liisa, Auvinen, Petri, Paulin, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034344/
https://www.ncbi.nlm.nih.gov/pubmed/29976145
http://dx.doi.org/10.1186/s12859-018-2262-7
_version_ 1783337862580666368
author Somervuo, Panu
Koskinen, Patrik
Mei, Peng
Holm, Liisa
Auvinen, Petri
Paulin, Lars
author_facet Somervuo, Panu
Koskinen, Patrik
Mei, Peng
Holm, Liisa
Auvinen, Petri
Paulin, Lars
author_sort Somervuo, Panu
collection PubMed
description BACKGROUND: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. RESULTS: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel. CONCLUSIONS: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2262-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6034344
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60343442018-07-09 BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing Somervuo, Panu Koskinen, Patrik Mei, Peng Holm, Liisa Auvinen, Petri Paulin, Lars BMC Bioinformatics Methodology Article BACKGROUND: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. RESULTS: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel. CONCLUSIONS: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2262-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-07-05 /pmc/articles/PMC6034344/ /pubmed/29976145 http://dx.doi.org/10.1186/s12859-018-2262-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Somervuo, Panu
Koskinen, Patrik
Mei, Peng
Holm, Liisa
Auvinen, Petri
Paulin, Lars
BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title_full BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title_fullStr BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title_full_unstemmed BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title_short BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing
title_sort barcosel: a tool for selecting an optimal barcode set for high-throughput sequencing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034344/
https://www.ncbi.nlm.nih.gov/pubmed/29976145
http://dx.doi.org/10.1186/s12859-018-2262-7
work_keys_str_mv AT somervuopanu barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing
AT koskinenpatrik barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing
AT meipeng barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing
AT holmliisa barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing
AT auvinenpetri barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing
AT paulinlars barcoselatoolforselectinganoptimalbarcodesetforhighthroughputsequencing