Cargando…

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis

MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the curre...

Descripción completa

Detalles Bibliográficos
Autores principales: Kubovčiak, Jan, Kolář, Michal, Novotný, Jiří
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351969/
https://www.ncbi.nlm.nih.gov/pubmed/37465398
http://dx.doi.org/10.1093/bioadv/vbad089
_version_ 1785074418118033408
author Kubovčiak, Jan
Kolář, Michal
Novotný, Jiří
author_facet Kubovčiak, Jan
Kolář, Michal
Novotný, Jiří
author_sort Kubovčiak, Jan
collection PubMed
description MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility. RESULTS: We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility. AVAILABILITY AND IMPLEMENTATION: The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10351969
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103519692023-07-18 Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis Kubovčiak, Jan Kolář, Michal Novotný, Jiří Bioinform Adv Application Note MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility. RESULTS: We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility. AVAILABILITY AND IMPLEMENTATION: The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-07-06 /pmc/articles/PMC10351969/ /pubmed/37465398 http://dx.doi.org/10.1093/bioadv/vbad089 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Kubovčiak, Jan
Kolář, Michal
Novotný, Jiří
Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_full Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_fullStr Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_full_unstemmed Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_short Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_sort scdrake: a reproducible and scalable pipeline for scrna-seq data analysis
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351969/
https://www.ncbi.nlm.nih.gov/pubmed/37465398
http://dx.doi.org/10.1093/bioadv/vbad089
work_keys_str_mv AT kubovciakjan scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis
AT kolarmichal scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis
AT novotnyjiri scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis