Cargando…
ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping
In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is used to identify the differentially expressed (DE) genes between the cell clusters. However, this common pr...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401959/ https://www.ncbi.nlm.nih.gov/pubmed/37546812 http://dx.doi.org/10.1101/2023.07.21.550107 |
_version_ | 1785084778708467712 |
---|---|
author | Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica |
author_facet | Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica |
author_sort | Song, Dongyuan |
collection | PubMed |
description | In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is used to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used to define both cell clusters and DE genes, leading to false-positive DE genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE test for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality. The core idea of ClusterDE is to generate real-data-based synthetic null data with only one cluster, as a counterfactual in contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to find cell-type marker genes that are biologically meaningful. ClusterDE is fast, transparent, and adaptive to a wide range of clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE is generally applicable to post-clustering DE analysis, including single-cell multi-omics data analysis. |
format | Online Article Text |
id | pubmed-10401959 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-104019592023-08-05 ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica bioRxiv Article In typical single-cell RNA-seq (scRNA-seq) data analysis, a clustering algorithm is applied to find putative cell types as clusters, and then a statistical differential expression (DE) test is used to identify the differentially expressed (DE) genes between the cell clusters. However, this common procedure uses the same data twice, an issue known as “double dipping”: the same data is used to define both cell clusters and DE genes, leading to false-positive DE genes even when the cell clusters are spurious. To overcome this challenge, we propose ClusterDE, a post-clustering DE test for controlling the false discovery rate (FDR) of identified DE genes regardless of clustering quality. The core idea of ClusterDE is to generate real-data-based synthetic null data with only one cluster, as a counterfactual in contrast to the real data, for evaluating the whole procedure of clustering followed by a DE test. Using comprehensive simulation and real data analysis, we show that ClusterDE has not only solid FDR control but also the ability to find cell-type marker genes that are biologically meaningful. ClusterDE is fast, transparent, and adaptive to a wide range of clustering algorithms and DE tests. Besides scRNA-seq data, ClusterDE is generally applicable to post-clustering DE analysis, including single-cell multi-omics data analysis. Cold Spring Harbor Laboratory 2023-07-25 /pmc/articles/PMC10401959/ /pubmed/37546812 http://dx.doi.org/10.1101/2023.07.21.550107 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Song, Dongyuan Li, Kexin Ge, Xinzhou Li, Jingyi Jessica ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title | ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title_full | ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title_fullStr | ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title_full_unstemmed | ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title_short | ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping |
title_sort | clusterde: a post-clustering differential expression (de) method robust to false-positive inflation caused by double dipping |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401959/ https://www.ncbi.nlm.nih.gov/pubmed/37546812 http://dx.doi.org/10.1101/2023.07.21.550107 |
work_keys_str_mv | AT songdongyuan clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT likexin clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT gexinzhou clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping AT lijingyijessica clusterdeapostclusteringdifferentialexpressiondemethodrobusttofalsepositiveinflationcausedbydoubledipping |