Cargando…

Barcode identification for single cell genomics

BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computat...

Descripción completa

Detalles Bibliográficos
Autores principales: Tambe, Akshay, Pachter, Lior
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337828/
https://www.ncbi.nlm.nih.gov/pubmed/30654736
http://dx.doi.org/10.1186/s12859-019-2612-0
_version_ 1783388340862582784
author Tambe, Akshay
Pachter, Lior
author_facet Tambe, Akshay
Pachter, Lior
author_sort Tambe, Akshay
collection PubMed
description BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6337828
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63378282019-01-23 Barcode identification for single cell genomics Tambe, Akshay Pachter, Lior BMC Bioinformatics Software BACKGROUND: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-17 /pmc/articles/PMC6337828/ /pubmed/30654736 http://dx.doi.org/10.1186/s12859-019-2612-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Tambe, Akshay
Pachter, Lior
Barcode identification for single cell genomics
title Barcode identification for single cell genomics
title_full Barcode identification for single cell genomics
title_fullStr Barcode identification for single cell genomics
title_full_unstemmed Barcode identification for single cell genomics
title_short Barcode identification for single cell genomics
title_sort barcode identification for single cell genomics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337828/
https://www.ncbi.nlm.nih.gov/pubmed/30654736
http://dx.doi.org/10.1186/s12859-019-2612-0
work_keys_str_mv AT tambeakshay barcodeidentificationforsinglecellgenomics
AT pachterlior barcodeidentificationforsinglecellgenomics