Cargando…

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis

MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the curre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kubovčiak, Jan, Kolář, Michal, Novotný, Jiří
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Application Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351969/ https://www.ncbi.nlm.nih.gov/pubmed/37465398 http://dx.doi.org/10.1093/bioadv/vbad089

_version_	1785074418118033408
author	Kubovčiak, Jan Kolář, Michal Novotný, Jiří
author_facet	Kubovčiak, Jan Kolář, Michal Novotný, Jiří
author_sort	Kubovčiak, Jan
collection	PubMed
description	MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility. RESULTS: We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility. AVAILABILITY AND IMPLEMENTATION: The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format	Online Article Text
id	pubmed-10351969
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-103519692023-07-18 Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis Kubovčiak, Jan Kolář, Michal Novotný, Jiří Bioinform Adv Application Note MOTIVATION: While the workflow for primary analysis of single-cell RNA-seq (scRNA-seq) data is well established, the secondary analysis of the feature-barcode matrix is usually done by custom scripts. There is no fully automated pipeline in the R statistical environment, which would follow the current best programming practices and requirements for reproducibility. RESULTS: We have developed scdrake, a fully automated workflow for secondary analysis of scRNA-seq data, which is fully implemented in the R language and built within the drake framework. The pipeline includes quality control, cell and gene filtering, normalization, detection of highly variable genes, dimensionality reduction, clustering, cell type annotation, detection of marker genes, differential expression analysis and integration of multiple samples. The pipeline is reproducible and scalable, has an efficient execution, provides easy extendability and access to intermediate results and outputs rich HTML reports. Scdrake is distributed as a Docker image, which provides a straightforward setup and enhances reproducibility. AVAILABILITY AND IMPLEMENTATION: The source code and documentation are available under the MIT license at https://github.com/bioinfocz/scdrake and https://bioinfocz.github.io/scdrake, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-07-06 /pmc/articles/PMC10351969/ /pubmed/37465398 http://dx.doi.org/10.1093/bioadv/vbad089 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Application Note Kubovčiak, Jan Kolář, Michal Novotný, Jiří Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title	Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_full	Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_fullStr	Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_full_unstemmed	Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_short	Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
title_sort	scdrake: a reproducible and scalable pipeline for scrna-seq data analysis
topic	Application Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351969/ https://www.ncbi.nlm.nih.gov/pubmed/37465398 http://dx.doi.org/10.1093/bioadv/vbad089
work_keys_str_mv	AT kubovciakjan scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis AT kolarmichal scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis AT novotnyjiri scdrakeareproducibleandscalablepipelineforscrnaseqdataanalysis

Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis

Ejemplares similares