Cargando…

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible...

Descripción completa

Detalles Bibliográficos
Autores principales: Shainer, Inbal, Stemmer, Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/
https://www.ncbi.nlm.nih.gov/pubmed/34521337
http://dx.doi.org/10.1186/s12864-021-07930-6
_version_ 1783752460882411520
author Shainer, Inbal
Stemmer, Manuel
author_facet Shainer, Inbal
Stemmer, Manuel
author_sort Shainer, Inbal
collection PubMed
description BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6.
format Online
Article
Text
id pubmed-8439043
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84390432021-09-14 Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets Shainer, Inbal Stemmer, Manuel BMC Genomics Methodology Article BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6. BioMed Central 2021-09-14 /pmc/articles/PMC8439043/ /pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Shainer, Inbal
Stemmer, Manuel
Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_full Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_fullStr Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_full_unstemmed Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_short Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_sort choice of pre-processing pipeline influences clustering quality of scrna-seq datasets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/
https://www.ncbi.nlm.nih.gov/pubmed/34521337
http://dx.doi.org/10.1186/s12864-021-07930-6
work_keys_str_mv AT shainerinbal choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets
AT stemmermanuel choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets