Cargando…

Gene expression distribution deconvolution in single-cell RNA sequencing

Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression d...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jingshu, Huang, Mo, Torre, Eduardo, Dueck, Hannah, Shaffer, Sydney, Murray, John, Raj, Arjun, Li, Mingyao, Zhang, Nancy R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6048536/
https://www.ncbi.nlm.nih.gov/pubmed/29946020
http://dx.doi.org/10.1073/pnas.1721085115
_version_ 1783340119937253376
author Wang, Jingshu
Huang, Mo
Torre, Eduardo
Dueck, Hannah
Shaffer, Sydney
Murray, John
Raj, Arjun
Li, Mingyao
Zhang, Nancy R.
author_facet Wang, Jingshu
Huang, Mo
Torre, Eduardo
Dueck, Hannah
Shaffer, Sydney
Murray, John
Raj, Arjun
Li, Mingyao
Zhang, Nancy R.
author_sort Wang, Jingshu
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers.
format Online
Article
Text
id pubmed-6048536
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-60485362018-07-17 Gene expression distribution deconvolution in single-cell RNA sequencing Wang, Jingshu Huang, Mo Torre, Eduardo Dueck, Hannah Shaffer, Sydney Murray, John Raj, Arjun Li, Mingyao Zhang, Nancy R. Proc Natl Acad Sci U S A PNAS Plus Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. National Academy of Sciences 2018-07-10 2018-06-26 /pmc/articles/PMC6048536/ /pubmed/29946020 http://dx.doi.org/10.1073/pnas.1721085115 Text en Copyright © 2018 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle PNAS Plus
Wang, Jingshu
Huang, Mo
Torre, Eduardo
Dueck, Hannah
Shaffer, Sydney
Murray, John
Raj, Arjun
Li, Mingyao
Zhang, Nancy R.
Gene expression distribution deconvolution in single-cell RNA sequencing
title Gene expression distribution deconvolution in single-cell RNA sequencing
title_full Gene expression distribution deconvolution in single-cell RNA sequencing
title_fullStr Gene expression distribution deconvolution in single-cell RNA sequencing
title_full_unstemmed Gene expression distribution deconvolution in single-cell RNA sequencing
title_short Gene expression distribution deconvolution in single-cell RNA sequencing
title_sort gene expression distribution deconvolution in single-cell rna sequencing
topic PNAS Plus
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6048536/
https://www.ncbi.nlm.nih.gov/pubmed/29946020
http://dx.doi.org/10.1073/pnas.1721085115
work_keys_str_mv AT wangjingshu geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT huangmo geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT torreeduardo geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT dueckhannah geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT shaffersydney geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT murrayjohn geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT rajarjun geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT limingyao geneexpressiondistributiondeconvolutioninsinglecellrnasequencing
AT zhangnancyr geneexpressiondistributiondeconvolutioninsinglecellrnasequencing