Cargando…

An evaluation of statistical differential analysis methods in single-cell RNA-seq data

BACKGROUND: Single-cell RNA Sequencing is gaining popularity in recent years. Compared to bulk RNA-Seq, single-cell RNA Sequencing allows the gene expression being measured within individual cells instead of mean gene expression levels across all cells in the sample. Thus, cell-to-cell variation of...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Dongmei, Zand, Martin, Dye, Timothy, Goniewicz, Maciej, Rahman, Irfan, Xie, Zidian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055642/
https://www.ncbi.nlm.nih.gov/pubmed/36993457
http://dx.doi.org/10.21203/rs.3.rs-2670717/v1
Descripción
Sumario:BACKGROUND: Single-cell RNA Sequencing is gaining popularity in recent years. Compared to bulk RNA-Seq, single-cell RNA Sequencing allows the gene expression being measured within individual cells instead of mean gene expression levels across all cells in the sample. Thus, cell-to-cell variation of gene expressions could be examined. Gene differential expression analysis remains the major purpose in most single-cell RNA sequencing experiments and many methods have been developed in recent years to conduct gene differential expression analysis for single-cell RNA sequencing data. RESULTS: Through simulation studies and real data examples, we evaluated the performance of five open-source popular methods used for gene differential expression analysis in single-cell RNA sequencing data. The five methods included DEsingle (Zero-inflated negative binomial model), Linnorm (Empirical Bayes method on transformed count data using the limma package), monocle (An approximate Chi-Square likelihood ratio test), MAST (A generalized linear hurdle model), and DESeq2 (A generalized linear model with empirical Bayes approach and also commonly used for bulk RNA sequencing differential express analyses). We assessed the false discovery rate (FDR) control, sensitivity, specificity, accuracy, and area under the receiver operating characteristics (AUROC) curve for all five methods under different sample sizes, distribution assumptions, and proportions of zeros in the data. CONCLUSIONS: We found the MAST method performed the best among the five methods compared with the largest AUROC values across all tested sample sizes and different proportion of truly differential expressed genes, when the data followed negative binomial distributions. When the sample size increased to 100 in each group, the MAST method showed the best performance with the highest AUROC regardless of the data distributions. If the excess zeros were first filtered out before the gene differential analyses, the DESingle, Linnorm, and DESeq2 performed relatively better than the MAST and the monocle methods with higher AUROC values.