Cargando…
Barcode identification for single cell genomics
BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computat...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337828/ https://www.ncbi.nlm.nih.gov/pubmed/30654736 http://dx.doi.org/10.1186/s12859-019-2612-0 |
_version_ | 1783388340862582784 |
---|---|
author | Tambe, Akshay Pachter, Lior |
author_facet | Tambe, Akshay Pachter, Lior |
author_sort | Tambe, Akshay |
collection | PubMed |
description | BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6337828 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63378282019-01-23 Barcode identification for single cell genomics Tambe, Akshay Pachter, Lior BMC Bioinformatics Software BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-17 /pmc/articles/PMC6337828/ /pubmed/30654736 http://dx.doi.org/10.1186/s12859-019-2612-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Tambe, Akshay Pachter, Lior Barcode identification for single cell genomics |
title | Barcode identification for single cell genomics |
title_full | Barcode identification for single cell genomics |
title_fullStr | Barcode identification for single cell genomics |
title_full_unstemmed | Barcode identification for single cell genomics |
title_short | Barcode identification for single cell genomics |
title_sort | barcode identification for single cell genomics |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337828/ https://www.ncbi.nlm.nih.gov/pubmed/30654736 http://dx.doi.org/10.1186/s12859-019-2612-0 |
work_keys_str_mv | AT tambeakshay barcodeidentificationforsinglecellgenomics AT pachterlior barcodeidentificationforsinglecellgenomics |