Cargando…

Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, co...

Descripción completa

Detalles Bibliográficos
Autores principales: Thurman, Andrew L, Ratcliff, Jason A, Chimenti, Michael S, Pezzulo, Alejandro A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504643/
https://www.ncbi.nlm.nih.gov/pubmed/33970215
http://dx.doi.org/10.1093/bioinformatics/btab337
_version_ 1784581361794809856
author Thurman, Andrew L
Ratcliff, Jason A
Chimenti, Michael S
Pezzulo, Alejandro A
author_facet Thurman, Andrew L
Ratcliff, Jason A
Chimenti, Michael S
Pezzulo, Alejandro A
author_sort Thurman, Andrew L
collection PubMed
description MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests. RESULTS: First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control. AVAILABILITY AND IMPLEMENTATION: A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines. SUPPLEMENTARY INFORMATION: Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8504643
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85046432021-10-13 Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar Thurman, Andrew L Ratcliff, Jason A Chimenti, Michael S Pezzulo, Alejandro A Bioinformatics Original Papers MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests. RESULTS: First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control. AVAILABILITY AND IMPLEMENTATION: A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines. SUPPLEMENTARY INFORMATION: Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-10 /pmc/articles/PMC8504643/ /pubmed/33970215 http://dx.doi.org/10.1093/bioinformatics/btab337 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Thurman, Andrew L
Ratcliff, Jason A
Chimenti, Michael S
Pezzulo, Alejandro A
Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title_full Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title_fullStr Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title_full_unstemmed Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title_short Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
title_sort differential gene expression analysis for multi-subject single-cell rna-sequencing studies with aggregatebiovar
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504643/
https://www.ncbi.nlm.nih.gov/pubmed/33970215
http://dx.doi.org/10.1093/bioinformatics/btab337
work_keys_str_mv AT thurmanandrewl differentialgeneexpressionanalysisformultisubjectsinglecellrnasequencingstudieswithaggregatebiovar
AT ratcliffjasona differentialgeneexpressionanalysisformultisubjectsinglecellrnasequencingstudieswithaggregatebiovar
AT chimentimichaels differentialgeneexpressionanalysisformultisubjectsinglecellrnasequencingstudieswithaggregatebiovar
AT pezzuloalejandroa differentialgeneexpressionanalysisformultisubjectsinglecellrnasequencingstudieswithaggregatebiovar