Cargando…

Benchmarking UMI-based single-cell RNA-seq preprocessing workflows

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing w...

Descripción completa

Detalles Bibliográficos
Autores principales: You, Yue, Tian, Luyi, Su, Shian, Dong, Xueyi, Jabbari, Jafar S., Hickey, Peter F., Ritchie, Matthew E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8672463/
https://www.ncbi.nlm.nih.gov/pubmed/34906205
http://dx.doi.org/10.1186/s13059-021-02552-3
_version_ 1784615358678695936
author You, Yue
Tian, Luyi
Su, Shian
Dong, Xueyi
Jabbari, Jafar S.
Hickey, Peter F.
Ritchie, Matthew E.
author_facet You, Yue
Tian, Luyi
Su, Shian
Dong, Xueyi
Jabbari, Jafar S.
Hickey, Peter F.
Ritchie, Matthew E.
author_sort You, Yue
collection PubMed
description BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS: Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS: In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-021-02552-3).
format Online
Article
Text
id pubmed-8672463
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86724632021-12-15 Benchmarking UMI-based single-cell RNA-seq preprocessing workflows You, Yue Tian, Luyi Su, Shian Dong, Xueyi Jabbari, Jafar S. Hickey, Peter F. Ritchie, Matthew E. Genome Biol Research BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied. RESULTS: Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis. CONCLUSIONS: In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-021-02552-3). BioMed Central 2021-12-14 /pmc/articles/PMC8672463/ /pubmed/34906205 http://dx.doi.org/10.1186/s13059-021-02552-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
You, Yue
Tian, Luyi
Su, Shian
Dong, Xueyi
Jabbari, Jafar S.
Hickey, Peter F.
Ritchie, Matthew E.
Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title_full Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title_fullStr Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title_full_unstemmed Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title_short Benchmarking UMI-based single-cell RNA-seq preprocessing workflows
title_sort benchmarking umi-based single-cell rna-seq preprocessing workflows
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8672463/
https://www.ncbi.nlm.nih.gov/pubmed/34906205
http://dx.doi.org/10.1186/s13059-021-02552-3
work_keys_str_mv AT youyue benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT tianluyi benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT sushian benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT dongxueyi benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT jabbarijafars benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT hickeypeterf benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows
AT ritchiematthewe benchmarkingumibasedsinglecellrnaseqpreprocessingworkflows