Cargando…

Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level

SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis,...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarkar, Hirak, Srivastava, Avi, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612833/
https://www.ncbi.nlm.nih.gov/pubmed/31510649
http://dx.doi.org/10.1093/bioinformatics/btz351
_version_ 1783432947290865664
author Sarkar, Hirak
Srivastava, Avi
Patro, Rob
author_facet Sarkar, Hirak
Srivastava, Avi
Patro, Rob
author_sort Sarkar, Hirak
collection PubMed
description SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612833
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128332019-07-12 Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level Sarkar, Hirak Srivastava, Avi Patro, Rob Bioinformatics Ismb/Eccb 2019 Conference Proceedings SUMMARY: With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612833/ /pubmed/31510649 http://dx.doi.org/10.1093/bioinformatics/btz351 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Sarkar, Hirak
Srivastava, Avi
Patro, Rob
Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title_full Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title_fullStr Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title_full_unstemmed Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title_short Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level
title_sort minnow: a principled framework for rapid simulation of dscrna-seq data at the read level
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612833/
https://www.ncbi.nlm.nih.gov/pubmed/31510649
http://dx.doi.org/10.1093/bioinformatics/btz351
work_keys_str_mv AT sarkarhirak minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel
AT srivastavaavi minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel
AT patrorob minnowaprincipledframeworkforrapidsimulationofdscrnaseqdataatthereadlevel