Cargando…

EmptyNN: A neural network based on positive and unlabeled learning to remove cell-free droplets and recover lost cells in scRNA-seq data

Droplet-based single-cell RNA sequencing (scRNA-seq) has significantly increased the number of cells profiled per experiment and revolutionized the study of individual transcriptomes. However, to maximize the biological signal, robust computational methods are needed to distinguish cell-free from ce...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Fangfang, Zhao, Zhongming, Simon, Lukas M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369248/
https://www.ncbi.nlm.nih.gov/pubmed/34430929
http://dx.doi.org/10.1016/j.patter.2021.100311
Descripción
Sumario:Droplet-based single-cell RNA sequencing (scRNA-seq) has significantly increased the number of cells profiled per experiment and revolutionized the study of individual transcriptomes. However, to maximize the biological signal, robust computational methods are needed to distinguish cell-free from cell-containing droplets. Here, we introduce a novel cell-calling algorithm called EmptyNN, which trains a neural network based on positive-unlabeled learning for improved filtering of barcodes. For benchmarking purposes, we leveraged cell hashing and genetic variation to provide ground truth. EmptyNN accurately removed cell-free droplets while recovering lost cell clusters, and achieved an area under the receiver operating characteristics of 94.73% and 96.30%, respectively. Comparisons to current state-of-the-art cell-calling algorithms demonstrated the superior performance of EmptyNN. EmptyNN was further applied to a single-nucleus RNA sequencing (snRNA-seq) dataset and showed good performance. Therefore, EmptyNN represents a powerful tool to enhance both scRNA-seq and snRNA-seq quality control analyses.