Cargando…

glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data

MOTIVATION: The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor...

Descripción completa

Detalles Bibliográficos
Autores principales: Ahlmann-Eltze, Constantin, Huber, Wolfgang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023675/
https://www.ncbi.nlm.nih.gov/pubmed/33295604
http://dx.doi.org/10.1093/bioinformatics/btaa1009
_version_ 1783675158538485760
author Ahlmann-Eltze, Constantin
Huber, Wolfgang
author_facet Ahlmann-Eltze, Constantin
Huber, Wolfgang
author_sort Ahlmann-Eltze, Constantin
collection PubMed
description MOTIVATION: The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. RESULTS: We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. AVAILABILITYAND IMPLEMENTATION: The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8023675
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80236752021-04-13 glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data Ahlmann-Eltze, Constantin Huber, Wolfgang Bioinformatics Applications Notes MOTIVATION: The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. RESULTS: We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. AVAILABILITYAND IMPLEMENTATION: The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-09 /pmc/articles/PMC8023675/ /pubmed/33295604 http://dx.doi.org/10.1093/bioinformatics/btaa1009 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Ahlmann-Eltze, Constantin
Huber, Wolfgang
glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title_full glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title_fullStr glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title_full_unstemmed glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title_short glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data
title_sort glmgampoi: fitting gamma-poisson generalized linear models on single cell count data
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023675/
https://www.ncbi.nlm.nih.gov/pubmed/33295604
http://dx.doi.org/10.1093/bioinformatics/btaa1009
work_keys_str_mv AT ahlmanneltzeconstantin glmgampoifittinggammapoissongeneralizedlinearmodelsonsinglecellcountdata
AT huberwolfgang glmgampoifittinggammapoissongeneralizedlinearmodelsonsinglecellcountdata