Cargando…
Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/ https://www.ncbi.nlm.nih.gov/pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6 |
_version_ | 1783752460882411520 |
---|---|
author | Shainer, Inbal Stemmer, Manuel |
author_facet | Shainer, Inbal Stemmer, Manuel |
author_sort | Shainer, Inbal |
collection | PubMed |
description | BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6. |
format | Online Article Text |
id | pubmed-8439043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84390432021-09-14 Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets Shainer, Inbal Stemmer, Manuel BMC Genomics Methodology Article BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6. BioMed Central 2021-09-14 /pmc/articles/PMC8439043/ /pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Shainer, Inbal Stemmer, Manuel Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title | Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title_full | Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title_fullStr | Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title_full_unstemmed | Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title_short | Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets |
title_sort | choice of pre-processing pipeline influences clustering quality of scrna-seq datasets |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/ https://www.ncbi.nlm.nih.gov/pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6 |
work_keys_str_mv | AT shainerinbal choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets AT stemmermanuel choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets |