Cargando…

Normalization by distributional resampling of high throughput single-cell RNA-sequencing data

MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Brown, Jared, Ni, Zijian, Mohanty, Chitrasen, Bacher, Rhonda, Kendziorski, Christina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502161/
https://www.ncbi.nlm.nih.gov/pubmed/34146085
http://dx.doi.org/10.1093/bioinformatics/btab450
_version_ 1784795638142074880
author Brown, Jared
Ni, Zijian
Mohanty, Chitrasen
Bacher, Rhonda
Kendziorski, Christina
author_facet Brown, Jared
Ni, Zijian
Mohanty, Chitrasen
Bacher, Rhonda
Kendziorski, Christina
author_sort Brown, Jared
collection PubMed
description MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression for library size (LS), allowing the variance and other properties of the gene-specific expression distribution to be non-constant in LS. This often results in reduced power and increased false discoveries in downstream analyses, a problem which is exacerbated by the high proportion of zeros present in most datasets. RESULTS: To address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing, sample heterogeneity and varying zero proportions, leading to improved performance in downstream analyses in a number of settings. AVAILABILITY AND IMPLEMENTATION: The R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/Dino. The Dino package is further archived and freely available on Zenodo at https://doi.org/10.5281/zenodo.4897558. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9502161
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95021612022-09-26 Normalization by distributional resampling of high throughput single-cell RNA-sequencing data Brown, Jared Ni, Zijian Mohanty, Chitrasen Bacher, Rhonda Kendziorski, Christina Bioinformatics Original Papers MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression for library size (LS), allowing the variance and other properties of the gene-specific expression distribution to be non-constant in LS. This often results in reduced power and increased false discoveries in downstream analyses, a problem which is exacerbated by the high proportion of zeros present in most datasets. RESULTS: To address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing, sample heterogeneity and varying zero proportions, leading to improved performance in downstream analyses in a number of settings. AVAILABILITY AND IMPLEMENTATION: The R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/Dino. The Dino package is further archived and freely available on Zenodo at https://doi.org/10.5281/zenodo.4897558. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-06-19 /pmc/articles/PMC9502161/ /pubmed/34146085 http://dx.doi.org/10.1093/bioinformatics/btab450 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Brown, Jared
Ni, Zijian
Mohanty, Chitrasen
Bacher, Rhonda
Kendziorski, Christina
Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title_full Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title_fullStr Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title_full_unstemmed Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title_short Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
title_sort normalization by distributional resampling of high throughput single-cell rna-sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502161/
https://www.ncbi.nlm.nih.gov/pubmed/34146085
http://dx.doi.org/10.1093/bioinformatics/btab450
work_keys_str_mv AT brownjared normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata
AT nizijian normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata
AT mohantychitrasen normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata
AT bacherrhonda normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata
AT kendziorskichristina normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata