Cargando…
Indel-correcting DNA barcodes for high-throughput sequencing
Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6142223/ https://www.ncbi.nlm.nih.gov/pubmed/29925596 http://dx.doi.org/10.1073/pnas.1802640115 |
_version_ | 1783355826965053440 |
---|---|
author | Hawkins, John A. Jones, Stephen K. Finkelstein, Ilya J. Press, William H. |
author_facet | Hawkins, John A. Jones, Stephen K. Finkelstein, Ilya J. Press, William H. |
author_sort | Hawkins, John A. |
collection | PubMed |
description | Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >10(6) single-error–correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >10(15) error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community. |
format | Online Article Text |
id | pubmed-6142223 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-61422232018-09-19 Indel-correcting DNA barcodes for high-throughput sequencing Hawkins, John A. Jones, Stephen K. Finkelstein, Ilya J. Press, William H. Proc Natl Acad Sci U S A PNAS Plus Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >10(6) single-error–correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >10(15) error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community. National Academy of Sciences 2018-07-03 2018-06-20 /pmc/articles/PMC6142223/ /pubmed/29925596 http://dx.doi.org/10.1073/pnas.1802640115 Text en Copyright © 2018 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | PNAS Plus Hawkins, John A. Jones, Stephen K. Finkelstein, Ilya J. Press, William H. Indel-correcting DNA barcodes for high-throughput sequencing |
title | Indel-correcting DNA barcodes for high-throughput sequencing |
title_full | Indel-correcting DNA barcodes for high-throughput sequencing |
title_fullStr | Indel-correcting DNA barcodes for high-throughput sequencing |
title_full_unstemmed | Indel-correcting DNA barcodes for high-throughput sequencing |
title_short | Indel-correcting DNA barcodes for high-throughput sequencing |
title_sort | indel-correcting dna barcodes for high-throughput sequencing |
topic | PNAS Plus |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6142223/ https://www.ncbi.nlm.nih.gov/pubmed/29925596 http://dx.doi.org/10.1073/pnas.1802640115 |
work_keys_str_mv | AT hawkinsjohna indelcorrectingdnabarcodesforhighthroughputsequencing AT jonesstephenk indelcorrectingdnabarcodesforhighthroughputsequencing AT finkelsteinilyaj indelcorrectingdnabarcodesforhighthroughputsequencing AT presswilliamh indelcorrectingdnabarcodesforhighthroughputsequencing |