Cargando…

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication con...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Avi, Malik, Laraib, Smith, Tom, Sudbery, Ian, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437997/
https://www.ncbi.nlm.nih.gov/pubmed/30917859
http://dx.doi.org/10.1186/s13059-019-1670-y
_version_ 1783407037183426560
author Srivastava, Avi
Malik, Laraib
Smith, Tom
Sudbery, Ian
Patro, Rob
author_facet Srivastava, Avi
Malik, Laraib
Smith, Tom
Sudbery, Ian
Patro, Rob
author_sort Srivastava, Avi
collection PubMed
description We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1670-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6437997
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64379972019-04-08 Alevin efficiently estimates accurate gene abundances from dscRNA-seq data Srivastava, Avi Malik, Laraib Smith, Tom Sudbery, Ian Patro, Rob Genome Biol Method We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1670-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-27 /pmc/articles/PMC6437997/ /pubmed/30917859 http://dx.doi.org/10.1186/s13059-019-1670-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Method
Srivastava, Avi
Malik, Laraib
Smith, Tom
Sudbery, Ian
Patro, Rob
Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title_full Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title_fullStr Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title_full_unstemmed Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title_short Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
title_sort alevin efficiently estimates accurate gene abundances from dscrna-seq data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6437997/
https://www.ncbi.nlm.nih.gov/pubmed/30917859
http://dx.doi.org/10.1186/s13059-019-1670-y
work_keys_str_mv AT srivastavaavi alevinefficientlyestimatesaccurategeneabundancesfromdscrnaseqdata
AT maliklaraib alevinefficientlyestimatesaccurategeneabundancesfromdscrnaseqdata
AT smithtom alevinefficientlyestimatesaccurategeneabundancesfromdscrnaseqdata
AT sudberyian alevinefficientlyestimatesaccurategeneabundancesfromdscrnaseqdata
AT patrorob alevinefficientlyestimatesaccurategeneabundancesfromdscrnaseqdata