A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification
MOTIVATION: Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355277/ https://www.ncbi.nlm.nih.gov/pubmed/32657394 http://dx.doi.org/10.1093/bioinformatics/btaa450 |
_version_ | 1783558243132375040 |
---|---|
author | Srivastava, Avi Malik, Laraib Sarkar, Hirak Patro, Rob |
author_facet | Srivastava, Avi Malik, Laraib Sarkar, Hirak Patro, Rob |
author_sort | Srivastava, Avi |
collection | PubMed |
description | MOTIVATION: Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. RESULTS: We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene-expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene-level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups. AVAILABILITY AND IMPLEMENTATION: The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0. |
format | Online Article Text |
id | pubmed-7355277 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-73552772020-07-16 A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification Srivastava, Avi Malik, Laraib Sarkar, Hirak Patro, Rob Bioinformatics Macromolecular Sequence, Structure, and Function MOTIVATION: Droplet-based single-cell RNA-seq (dscRNA-seq) data are being generated at an unprecedented pace, and the accurate estimation of gene-level abundances for each cell is a crucial first step in most dscRNA-seq analyses. When pre-processing the raw dscRNA-seq data to generate a count matrix, care must be taken to account for the potentially large number of multi-mapping locations per read. The sparsity of dscRNA-seq data, and the strong 3’ sampling bias, makes it difficult to disambiguate cases where there is no uniquely mapping read to any of the candidate target genes. RESULTS: We introduce a Bayesian framework for information sharing across cells within a sample, or across multiple modalities of data using the same sample, to improve gene quantification estimates for dscRNA-seq data. We use an anchor-based approach to connect cells with similar gene-expression patterns, and learn informative, empirical priors which we provide to alevin’s gene multi-mapping resolution algorithm. This improves the quantification estimates for genes with no uniquely mapping reads (i.e. when there is no unique intra-cellular information). We show our new model improves the per cell gene-level estimates and provides a principled framework for information sharing across multiple modalities. We test our method on a combination of simulated and real datasets under various setups. AVAILABILITY AND IMPLEMENTATION: The information sharing model is included in alevin and is implemented in C++14. It is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/salmon as of version 1.1.0. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355277/ /pubmed/32657394 http://dx.doi.org/10.1093/bioinformatics/btaa450 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Macromolecular Sequence, Structure, and Function Srivastava, Avi Malik, Laraib Sarkar, Hirak Patro, Rob A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title | A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title_full | A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title_fullStr | A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title_full_unstemmed | A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title_short | A Bayesian framework for inter-cellular information sharing improves dscRNA-seq quantification |
title_sort | bayesian framework for inter-cellular information sharing improves dscrna-seq quantification |
topic | Macromolecular Sequence, Structure, and Function |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355277/ https://www.ncbi.nlm.nih.gov/pubmed/32657394 http://dx.doi.org/10.1093/bioinformatics/btaa450 |
work_keys_str_mv | AT srivastavaavi abayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT maliklaraib abayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT sarkarhirak abayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT patrorob abayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT srivastavaavi bayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT maliklaraib bayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT sarkarhirak bayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification AT patrorob bayesianframeworkforintercellularinformationsharingimprovesdscrnaseqquantification |