Cargando…
Normalization by distributional resampling of high throughput single-cell RNA-sequencing data
MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expre...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502161/ https://www.ncbi.nlm.nih.gov/pubmed/34146085 http://dx.doi.org/10.1093/bioinformatics/btab450 |
_version_ | 1784795638142074880 |
---|---|
author | Brown, Jared Ni, Zijian Mohanty, Chitrasen Bacher, Rhonda Kendziorski, Christina |
author_facet | Brown, Jared Ni, Zijian Mohanty, Chitrasen Bacher, Rhonda Kendziorski, Christina |
author_sort | Brown, Jared |
collection | PubMed |
description | MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression for library size (LS), allowing the variance and other properties of the gene-specific expression distribution to be non-constant in LS. This often results in reduced power and increased false discoveries in downstream analyses, a problem which is exacerbated by the high proportion of zeros present in most datasets. RESULTS: To address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing, sample heterogeneity and varying zero proportions, leading to improved performance in downstream analyses in a number of settings. AVAILABILITY AND IMPLEMENTATION: The R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/Dino. The Dino package is further archived and freely available on Zenodo at https://doi.org/10.5281/zenodo.4897558. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9502161 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-95021612022-09-26 Normalization by distributional resampling of high throughput single-cell RNA-sequencing data Brown, Jared Ni, Zijian Mohanty, Chitrasen Bacher, Rhonda Kendziorski, Christina Bioinformatics Original Papers MOTIVATION: Normalization to remove technical or experimental artifacts is critical in the analysis of single-cell RNA-sequencing experiments, even those for which unique molecular identifiers are available. The majority of methods for normalizing single-cell RNA-sequencing data adjust average expression for library size (LS), allowing the variance and other properties of the gene-specific expression distribution to be non-constant in LS. This often results in reduced power and increased false discoveries in downstream analyses, a problem which is exacerbated by the high proportion of zeros present in most datasets. RESULTS: To address this, we present Dino, a normalization method based on a flexible negative-binomial mixture model of gene expression. As demonstrated in both simulated and case study datasets, by normalizing the entire gene expression distribution, Dino is robust to shallow sequencing, sample heterogeneity and varying zero proportions, leading to improved performance in downstream analyses in a number of settings. AVAILABILITY AND IMPLEMENTATION: The R package, Dino, is available on GitHub at https://github.com/JBrownBiostat/Dino. The Dino package is further archived and freely available on Zenodo at https://doi.org/10.5281/zenodo.4897558. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-06-19 /pmc/articles/PMC9502161/ /pubmed/34146085 http://dx.doi.org/10.1093/bioinformatics/btab450 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Brown, Jared Ni, Zijian Mohanty, Chitrasen Bacher, Rhonda Kendziorski, Christina Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title | Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title_full | Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title_fullStr | Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title_full_unstemmed | Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title_short | Normalization by distributional resampling of high throughput single-cell RNA-sequencing data |
title_sort | normalization by distributional resampling of high throughput single-cell rna-sequencing data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502161/ https://www.ncbi.nlm.nih.gov/pubmed/34146085 http://dx.doi.org/10.1093/bioinformatics/btab450 |
work_keys_str_mv | AT brownjared normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata AT nizijian normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata AT mohantychitrasen normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata AT bacherrhonda normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata AT kendziorskichristina normalizationbydistributionalresamplingofhighthroughputsinglecellrnasequencingdata |