Cargando…

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalizati...

Descripción completa

Detalles Bibliográficos
Autores principales: Hafemeister, Christoph, Satija, Rahul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927181/
https://www.ncbi.nlm.nih.gov/pubmed/31870423
http://dx.doi.org/10.1186/s13059-019-1874-1
_version_ 1783482258324193280
author Hafemeister, Christoph
Satija, Rahul
author_facet Hafemeister, Christoph
Satija, Rahul
author_sort Hafemeister, Christoph
collection PubMed
description Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.
format Online
Article
Text
id pubmed-6927181
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69271812019-12-30 Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression Hafemeister, Christoph Satija, Rahul Genome Biol Method Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat. BioMed Central 2019-12-23 /pmc/articles/PMC6927181/ /pubmed/31870423 http://dx.doi.org/10.1186/s13059-019-1874-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Method
Hafemeister, Christoph
Satija, Rahul
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_full Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_fullStr Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_full_unstemmed Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_short Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
title_sort normalization and variance stabilization of single-cell rna-seq data using regularized negative binomial regression
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927181/
https://www.ncbi.nlm.nih.gov/pubmed/31870423
http://dx.doi.org/10.1186/s13059-019-1874-1
work_keys_str_mv AT hafemeisterchristoph normalizationandvariancestabilizationofsinglecellrnaseqdatausingregularizednegativebinomialregression
AT satijarahul normalizationandvariancestabilizationofsinglecellrnaseqdatausingregularizednegativebinomialregression