Cargando…

bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data

MOTIVATION: Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an ef...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Wenhao, Bertaux, François, Thomas, Philipp, Stefanelli, Claire, Saint, Malika, Marguerat, Samuel, Shahrezaei, Vahid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703772/
https://www.ncbi.nlm.nih.gov/pubmed/31584606
http://dx.doi.org/10.1093/bioinformatics/btz726
_version_ 1783616692858912768
author Tang, Wenhao
Bertaux, François
Thomas, Philipp
Stefanelli, Claire
Saint, Malika
Marguerat, Samuel
Shahrezaei, Vahid
author_facet Tang, Wenhao
Bertaux, François
Thomas, Philipp
Stefanelli, Claire
Saint, Malika
Marguerat, Samuel
Shahrezaei, Vahid
author_sort Tang, Wenhao
collection PubMed
description MOTIVATION: Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction. RESULTS: Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The R package ‘bayNorm’ is publishd on bioconductor at https://bioconductor.org/packages/release/bioc/html/bayNorm.html. The code for analyzing data in this article is available at https://github.com/WT215/bayNorm_papercode. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7703772
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037722020-12-07 bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data Tang, Wenhao Bertaux, François Thomas, Philipp Stefanelli, Claire Saint, Malika Marguerat, Samuel Shahrezaei, Vahid Bioinformatics Original Papers MOTIVATION: Normalization of single-cell RNA-sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability, high amounts of missing observations and batch effect typical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient and unified approach for normalization, imputation and batch effect correction. RESULTS: Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We first validate our assumptions by showing this model can reproduce different statistics observed in real scRNA-seq data. We demonstrate using publicly available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule fluorescence in situ hybridization measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared with other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalization, imputation and true count recovery of gene expression measurements from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The R package ‘bayNorm’ is publishd on bioconductor at https://bioconductor.org/packages/release/bioc/html/bayNorm.html. The code for analyzing data in this article is available at https://github.com/WT215/bayNorm_papercode. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-10-04 /pmc/articles/PMC7703772/ /pubmed/31584606 http://dx.doi.org/10.1093/bioinformatics/btz726 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Tang, Wenhao
Bertaux, François
Thomas, Philipp
Stefanelli, Claire
Saint, Malika
Marguerat, Samuel
Shahrezaei, Vahid
bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title_full bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title_fullStr bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title_full_unstemmed bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title_short bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data
title_sort baynorm: bayesian gene expression recovery, imputation and normalization for single-cell rna-sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703772/
https://www.ncbi.nlm.nih.gov/pubmed/31584606
http://dx.doi.org/10.1093/bioinformatics/btz726
work_keys_str_mv AT tangwenhao baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT bertauxfrancois baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT thomasphilipp baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT stefanelliclaire baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT saintmalika baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT margueratsamuel baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata
AT shahrezaeivahid baynormbayesiangeneexpressionrecoveryimputationandnormalizationforsinglecellrnasequencingdata