Cargando…

ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping

In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is employed to identify the differentially expressed (DE) genes between the cell clusters. However, this commo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Song, Dongyuan, Li, Kexin, Ge, Xinzhou, Li, Jingyi Jessica
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Journal Experts 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418557/ https://www.ncbi.nlm.nih.gov/pubmed/37577698 http://dx.doi.org/10.21203/rs.3.rs-3211191/v1

_version_	1785088292934385664
author	Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica
author_facet	Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica
author_sort	Song, Dongyuan
collection	PubMed
description	In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is employed to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used twice to define cell clusters as potential cell types and DE genes as potential cell-type marker genes, leading to false-positive cell-type marker genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE method for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality, which can work as an add-on to popular pipelines such as Seurat. The core idea of ClusterDE is to generate real-data-based synthetic null data containing only one cluster, as contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to identify cell-type marker genes as top DE genes and distinguish them from housekeeping genes. ClusterDE is fast, transparent, and adaptive to a wide range of clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE is generally applicable to post-clustering DE analysis, including single-cell multi-omics data analysis.
format	Online Article Text
id	pubmed-10418557
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Journal Experts
record_format	MEDLINE/PubMed
spelling	pubmed-104185572023-08-12 ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica Res Sq Article In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is employed to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used twice to define cell clusters as potential cell types and DE genes as potential cell-type marker genes, leading to false-positive cell-type marker genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE method for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality, which can work as an add-on to popular pipelines such as Seurat. The core idea of ClusterDE is to generate real-data-based synthetic null data containing only one cluster, as contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to identify cell-type marker genes as top DE genes and distinguish them from housekeeping genes. ClusterDE is fast, transparent, and adaptive to a wide range of clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE is generally applicable to post-clustering DE analysis, including single-cell multi-omics data analysis. American Journal Experts 2023-08-02 /pmc/articles/PMC10418557/ /pubmed/37577698 http://dx.doi.org/10.21203/rs.3.rs-3211191/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title	ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title_full	ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title_fullStr	ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title_full_unstemmed	ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title_short	ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
title_sort	clusterde: a post-clustering differential expression (de) method robust to false-positive inflation caused by double dipping
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418557/ https://www.ncbi.nlm.nih.gov/pubmed/37577698 http://dx.doi.org/10.21203/rs.3.rs-3211191/v1
work_keys_str_mv	AT songdongyuan clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT likexin clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT gexinzhou clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT lijingyijessica clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping

ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping

Ejemplares similares