Cargando…

SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data

Single-cell RNA-seq data analysis generally requires quality control, normalization, highly variable genes screening, dimensionality reduction and clustering. Among these processes, downstream analysis including dimensionality reduction and clustering are sensitive to the selection of highly variabl...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yinan, Xie, Xiaowei, Wu, Peng, Zhu, Ping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Lippincott Williams & Wilkins 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8974938/
https://www.ncbi.nlm.nih.gov/pubmed/35402832
http://dx.doi.org/10.1097/BS9.0000000000000072
_version_ 1784680305153540096
author Zhang, Yinan
Xie, Xiaowei
Wu, Peng
Zhu, Ping
author_facet Zhang, Yinan
Xie, Xiaowei
Wu, Peng
Zhu, Ping
author_sort Zhang, Yinan
collection PubMed
description Single-cell RNA-seq data analysis generally requires quality control, normalization, highly variable genes screening, dimensionality reduction and clustering. Among these processes, downstream analysis including dimensionality reduction and clustering are sensitive to the selection of highly variable genes. Though increasing number of tools for selecting the highly variable genes have been developed, an evaluation of their performances and a general strategy are lack. Here, we compare the performance of nine commonly used methods for screening variable genes by using single-cell RNA-seq data from hematopoietic stem/progenitor cells and mature blood cells, and find that SCHS outperforms other methods regarding to reproducibility and accuracy. However, this method prefers the selection of highly expressed genes. We further propose a new strategy SIEVE (SIngle-cEll Variable gEnes) by multiple rounds of random sampling, therefore minimizing the stochastic noise and identifying a robust set of variable genes. Moreover, SIEVE recovers lowly expressed genes as variable genes and substantially improves the accuracy of single cell classification, especially for the methods with lower reproducibility. The SIEVE software is freely available at https://github.com/YinanZhang522/SIEVE.
format Online
Article
Text
id pubmed-8974938
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Lippincott Williams & Wilkins
record_format MEDLINE/PubMed
spelling pubmed-89749382022-04-07 SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data Zhang, Yinan Xie, Xiaowei Wu, Peng Zhu, Ping Blood Sci Research Article Single-cell RNA-seq data analysis generally requires quality control, normalization, highly variable genes screening, dimensionality reduction and clustering. Among these processes, downstream analysis including dimensionality reduction and clustering are sensitive to the selection of highly variable genes. Though increasing number of tools for selecting the highly variable genes have been developed, an evaluation of their performances and a general strategy are lack. Here, we compare the performance of nine commonly used methods for screening variable genes by using single-cell RNA-seq data from hematopoietic stem/progenitor cells and mature blood cells, and find that SCHS outperforms other methods regarding to reproducibility and accuracy. However, this method prefers the selection of highly expressed genes. We further propose a new strategy SIEVE (SIngle-cEll Variable gEnes) by multiple rounds of random sampling, therefore minimizing the stochastic noise and identifying a robust set of variable genes. Moreover, SIEVE recovers lowly expressed genes as variable genes and substantially improves the accuracy of single cell classification, especially for the methods with lower reproducibility. The SIEVE software is freely available at https://github.com/YinanZhang522/SIEVE. Lippincott Williams & Wilkins 2021-04-28 /pmc/articles/PMC8974938/ /pubmed/35402832 http://dx.doi.org/10.1097/BS9.0000000000000072 Text en Copyright © 2021 The Authors. Published by Wolters Kluwer Health Inc., on behalf of the Chinese Association for Blood Sciences. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. http://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/)
spellingShingle Research Article
Zhang, Yinan
Xie, Xiaowei
Wu, Peng
Zhu, Ping
SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title_full SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title_fullStr SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title_full_unstemmed SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title_short SIEVE: identifying robust single cell variable genes for single-cell RNA sequencing data
title_sort sieve: identifying robust single cell variable genes for single-cell rna sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8974938/
https://www.ncbi.nlm.nih.gov/pubmed/35402832
http://dx.doi.org/10.1097/BS9.0000000000000072
work_keys_str_mv AT zhangyinan sieveidentifyingrobustsinglecellvariablegenesforsinglecellrnasequencingdata
AT xiexiaowei sieveidentifyingrobustsinglecellvariablegenesforsinglecellrnasequencingdata
AT wupeng sieveidentifyingrobustsinglecellvariablegenesforsinglecellrnasequencingdata
AT zhuping sieveidentifyingrobustsinglecellvariablegenesforsinglecellrnasequencingdata