Cargando…
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612833/ https://www.ncbi.nlm.nih.gov/pubmed/31510649 http://dx.doi.org/10.1093/bioinformatics/btz351 |
_version_ | 1783432947290865664 |
---|---|
author | Sarkar, Hirak Srivastava, Avi Patro, Rob |
author_facet | Sarkar, Hirak Srivastava, Avi Patro, Rob |
author_sort | Sarkar, Hirak |
collection | PubMed |
description | SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6612833 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-66128332019-07-12 Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level Sarkar, Hirak Srivastava, Avi Patro, Rob Bioinformatics Ismb/Eccb 2019 Conference Proceedings SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612833/ /pubmed/31510649 http://dx.doi.org/10.1093/bioinformatics/btz351 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2019 Conference Proceedings Sarkar, Hirak Srivastava, Avi Patro, Rob Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title |
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title_full |
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title_fullStr |
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title_full_unstemmed |
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title_short |
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level |
title_sort | minnow: a principled framework for rapid simulation of dscrna-seq data at the read level |
topic | Ismb/Eccb 2019 Conference Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612833/ https://www.ncbi.nlm.nih.gov/pubmed/31510649 http://dx.doi.org/10.1093/bioinformatics/btz351 |
work_keys_str_mv | AT sarkarhirak minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel AT srivastavaavi minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel AT patrorob minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel |