Cargando…

Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes

Short DNA oligonucleotides (~4 mer) have been used to index samples from different sources, such as in multiplex sequencing. Presently, longer oligonucleotides (8–12 mer) are being used as molecular barcodes with which to distinguish among raw DNA molecules in many high-tech sequence analyses, inclu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, In Seok, Bae, Sang Won, Park, BeumJin, Kim, Sangwoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7891705/ https://www.ncbi.nlm.nih.gov/pubmed/33600481 http://dx.doi.org/10.1371/journal.pone.0246354

_version_	1783652755725877248
author	Yang, In Seok Bae, Sang Won Park, BeumJin Kim, Sangwoo
author_facet	Yang, In Seok Bae, Sang Won Park, BeumJin Kim, Sangwoo
author_sort	Yang, In Seok
collection	PubMed
description	Short DNA oligonucleotides (~4 mer) have been used to index samples from different sources, such as in multiplex sequencing. Presently, longer oligonucleotides (8–12 mer) are being used as molecular barcodes with which to distinguish among raw DNA molecules in many high-tech sequence analyses, including low-frequent mutation detection, quantitative transcriptome analysis, and single-cell sequencing. Despite some advantages of using molecular barcodes with random sequences, such an approach, however, makes it impossible to know the exact sequences used in an experiment and can lead to inaccurate interpretation due to misclustering of barcodes arising from the occurrence of unexpected mutations in the barcodes. The present study introduces a tool developed for selecting an optimal barcode subset during molecular barcoding. The program considers five barcode factors: GC content, homopolymers, simple sequence repeats with repeated units of dinucleotides, Hamming distance, and complementarity between barcodes. To evaluate a selected barcode set, penalty scores for the factors are defined based on their distributions observed in random barcodes. The algorithm employed in the program comprises two steps: i) random generation of an initial set and ii) optimal barcode selection via iterative replacement. Users can execute the program by inputting barcode length and the number of barcodes to be generated. Furthermore, the program accepts a user’s own values for other parameters, including penalty scores, for advanced use, allowing it to be applied in various conditions. In many test runs to obtain 100000 barcodes with lengths of 12 nucleotides, the program showed fast performance, efficient enough to generate optimal barcode sequences with merely the use of a desktop PC. We also showed that VFOS has comparable performance, flexibility in program running, consideration of simple sequence repeats, and fast computation time in comparison with other two tools (DNABarcodes and FreeBarcodes). Owing to the versatility and fast performance of the program, we expect that many researchers will opt to apply it for selecting optimal barcode sets during their experiments, including next-generation sequencing.
format	Online Article Text
id	pubmed-7891705
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-78917052021-02-25 Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes Yang, In Seok Bae, Sang Won Park, BeumJin Kim, Sangwoo PLoS One Research Article Short DNA oligonucleotides (~4 mer) have been used to index samples from different sources, such as in multiplex sequencing. Presently, longer oligonucleotides (8–12 mer) are being used as molecular barcodes with which to distinguish among raw DNA molecules in many high-tech sequence analyses, including low-frequent mutation detection, quantitative transcriptome analysis, and single-cell sequencing. Despite some advantages of using molecular barcodes with random sequences, such an approach, however, makes it impossible to know the exact sequences used in an experiment and can lead to inaccurate interpretation due to misclustering of barcodes arising from the occurrence of unexpected mutations in the barcodes. The present study introduces a tool developed for selecting an optimal barcode subset during molecular barcoding. The program considers five barcode factors: GC content, homopolymers, simple sequence repeats with repeated units of dinucleotides, Hamming distance, and complementarity between barcodes. To evaluate a selected barcode set, penalty scores for the factors are defined based on their distributions observed in random barcodes. The algorithm employed in the program comprises two steps: i) random generation of an initial set and ii) optimal barcode selection via iterative replacement. Users can execute the program by inputting barcode length and the number of barcodes to be generated. Furthermore, the program accepts a user’s own values for other parameters, including penalty scores, for advanced use, allowing it to be applied in various conditions. In many test runs to obtain 100000 barcodes with lengths of 12 nucleotides, the program showed fast performance, efficient enough to generate optimal barcode sequences with merely the use of a desktop PC. We also showed that VFOS has comparable performance, flexibility in program running, consideration of simple sequence repeats, and fast computation time in comparison with other two tools (DNABarcodes and FreeBarcodes). Owing to the versatility and fast performance of the program, we expect that many researchers will opt to apply it for selecting optimal barcode sets during their experiments, including next-generation sequencing. Public Library of Science 2021-02-18 /pmc/articles/PMC7891705/ /pubmed/33600481 http://dx.doi.org/10.1371/journal.pone.0246354 Text en © 2021 Yang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Yang, In Seok Bae, Sang Won Park, BeumJin Kim, Sangwoo Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title	Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title_full	Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title_fullStr	Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title_full_unstemmed	Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title_short	Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
title_sort	development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7891705/ https://www.ncbi.nlm.nih.gov/pubmed/33600481 http://dx.doi.org/10.1371/journal.pone.0246354
work_keys_str_mv	AT yanginseok developmentofaprogramforinsilicooptimizedselectionofoligonucleotidebasedmolecularbarcodes AT baesangwon developmentofaprogramforinsilicooptimizedselectionofoligonucleotidebasedmolecularbarcodes AT parkbeumjin developmentofaprogramforinsilicooptimizedselectionofoligonucleotidebasedmolecularbarcodes AT kimsangwoo developmentofaprogramforinsilicooptimizedselectionofoligonucleotidebasedmolecularbarcodes

Development of a program for in silico optimized selection of oligonucleotide-based molecular barcodes

Ejemplares similares