Cargando…

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shainer, Inbal, Stemmer, Manuel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/ https://www.ncbi.nlm.nih.gov/pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6

_version_	1783752460882411520
author	Shainer, Inbal Stemmer, Manuel
author_facet	Shainer, Inbal Stemmer, Manuel
author_sort	Shainer, Inbal
collection	PubMed
description	BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6.
format	Online Article Text
id	pubmed-8439043
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-84390432021-09-14 Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets Shainer, Inbal Stemmer, Manuel BMC Genomics Methodology Article BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics’ Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics’ Cell Ranger pipeline. RESULTS: In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. CONCLUSION: While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07930-6. BioMed Central 2021-09-14 /pmc/articles/PMC8439043/ /pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Shainer, Inbal Stemmer, Manuel Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title	Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_full	Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_fullStr	Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_full_unstemmed	Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_short	Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
title_sort	choice of pre-processing pipeline influences clustering quality of scrna-seq datasets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8439043/ https://www.ncbi.nlm.nih.gov/pubmed/34521337 http://dx.doi.org/10.1186/s12864-021-07930-6
work_keys_str_mv	AT shainerinbal choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets AT stemmermanuel choiceofpreprocessingpipelineinfluencesclusteringqualityofscrnaseqdatasets

Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

Ejemplares similares