Cargando…

Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs

Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. Thes...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lause, Jan, Ziegenhain, Christoph, Hartmanis, Leonard, Berens, Philipp, Kobak, Dmitry
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418209/ https://www.ncbi.nlm.nih.gov/pubmed/37577688 http://dx.doi.org/10.1101/2023.08.02.551637

_version_	1785088215144726528
author	Lause, Jan Ziegenhain, Christoph Hartmanis, Leonard Berens, Philipp Kobak, Dmitry
author_facet	Lause, Jan Ziegenhain, Christoph Hartmanis, Leonard Berens, Philipp Kobak, Dmitry
author_sort	Lause, Jan
collection	PubMed
description	Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce compound Pearson residuals and show that they can be analytically obtained without explicit knowledge of the amplification distribution. Further, we demonstrate that compound Pearson residuals lead to a biologically meaningful gene selection and low-dimensional embeddings of complex Smart-seq2 datasets. Finally, we empirically study amplification distributions across several sequencing protocols, and suggest that they can be described by a broken power law. We show that the resulting compound distribution captures overdispersion and zero-inflation patterns characteristic of read count data. In summary, compound Pearson residuals provide an efficient and effective way to normalize read count data based on simple mechanistic assumptions.
format	Online Article Text
id	pubmed-10418209
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-104182092023-08-12 Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs Lause, Jan Ziegenhain, Christoph Hartmanis, Leonard Berens, Philipp Kobak, Dmitry bioRxiv Article Before downstream analysis can reveal biological signals in single-cell RNA sequencing data, normalization and variance stabilization are required to remove technical noise. Recently, Pearson residuals based on negative binomial models have been suggested as an efficient normalization approach. These methods were developed for UMI-based sequencing protocols, where unique molecular identifiers (UMIs) help to remove PCR amplification noise by keeping track of the original molecules. In contrast, full-length protocols such as Smart-seq2 lack UMIs and retain amplification noise, making negative binomial models inapplicable. Here, we extend Pearson residuals to such read count data by modeling them as a compound process: we assume that the captured RNA molecules follow the negative binomial distribution, but are replicated according to an amplification distribution. Based on this model, we introduce compound Pearson residuals and show that they can be analytically obtained without explicit knowledge of the amplification distribution. Further, we demonstrate that compound Pearson residuals lead to a biologically meaningful gene selection and low-dimensional embeddings of complex Smart-seq2 datasets. Finally, we empirically study amplification distributions across several sequencing protocols, and suggest that they can be described by a broken power law. We show that the resulting compound distribution captures overdispersion and zero-inflation patterns characteristic of read count data. In summary, compound Pearson residuals provide an efficient and effective way to normalize read count data based on simple mechanistic assumptions. Cold Spring Harbor Laboratory 2023-08-05 /pmc/articles/PMC10418209/ /pubmed/37577688 http://dx.doi.org/10.1101/2023.08.02.551637 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle	Article Lause, Jan Ziegenhain, Christoph Hartmanis, Leonard Berens, Philipp Kobak, Dmitry Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title	Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title_full	Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title_fullStr	Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title_full_unstemmed	Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title_short	Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs
title_sort	compound models and pearson residuals for normalization of single-cell rna-seq data without umis
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418209/ https://www.ncbi.nlm.nih.gov/pubmed/37577688 http://dx.doi.org/10.1101/2023.08.02.551637
work_keys_str_mv	AT lausejan compoundmodelsandpearsonresidualsfornormalizationofsinglecellrnaseqdatawithoutumis AT ziegenhainchristoph compoundmodelsandpearsonresidualsfornormalizationofsinglecellrnaseqdatawithoutumis AT hartmanisleonard compoundmodelsandpearsonresidualsfornormalizationofsinglecellrnaseqdatawithoutumis AT berensphilipp compoundmodelsandpearsonresidualsfornormalizationofsinglecellrnaseqdatawithoutumis AT kobakdmitry compoundmodelsandpearsonresidualsfornormalizationofsinglecellrnaseqdatawithoutumis

Compound models and Pearson residuals for normalization of single-cell RNA-seq data without UMIs

Ejemplares similares