Cargando…

rCASC: reproducible classification analysis of single-cell sequencing data

BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools hav...

Descripción completa

Detalles Bibliográficos
Autores principales: Alessandrì, Luca, Cordero, Francesca, Beccuti, Marco, Arigoni, Maddalena, Olivero, Martina, Romano, Greta, Rabellino, Sergio, Licheri, Nicola, De Libero, Gennaro, Pace, Luigia, Calogero, Raffaele A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6732171/
https://www.ncbi.nlm.nih.gov/pubmed/31494672
http://dx.doi.org/10.1093/gigascience/giz105
_version_ 1783449778325028864
author Alessandrì, Luca
Cordero, Francesca
Beccuti, Marco
Arigoni, Maddalena
Olivero, Martina
Romano, Greta
Rabellino, Sergio
Licheri, Nicola
De Libero, Gennaro
Pace, Luigia
Calogero, Raffaele A
author_facet Alessandrì, Luca
Cordero, Francesca
Beccuti, Marco
Arigoni, Maddalena
Olivero, Martina
Romano, Greta
Rabellino, Sergio
Licheri, Nicola
De Libero, Gennaro
Pace, Luigia
Calogero, Raffaele A
author_sort Alessandrì, Luca
collection PubMed
description BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. FINDINGS: rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. CONCLUSIONS: rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.
format Online
Article
Text
id pubmed-6732171
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67321712019-09-12 rCASC: reproducible classification analysis of single-cell sequencing data Alessandrì, Luca Cordero, Francesca Beccuti, Marco Arigoni, Maddalena Olivero, Martina Romano, Greta Rabellino, Sergio Licheri, Nicola De Libero, Gennaro Pace, Luigia Calogero, Raffaele A Gigascience Technical Note BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. FINDINGS: rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. CONCLUSIONS: rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R. Oxford University Press 2019-09-08 /pmc/articles/PMC6732171/ /pubmed/31494672 http://dx.doi.org/10.1093/gigascience/giz105 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Alessandrì, Luca
Cordero, Francesca
Beccuti, Marco
Arigoni, Maddalena
Olivero, Martina
Romano, Greta
Rabellino, Sergio
Licheri, Nicola
De Libero, Gennaro
Pace, Luigia
Calogero, Raffaele A
rCASC: reproducible classification analysis of single-cell sequencing data
title rCASC: reproducible classification analysis of single-cell sequencing data
title_full rCASC: reproducible classification analysis of single-cell sequencing data
title_fullStr rCASC: reproducible classification analysis of single-cell sequencing data
title_full_unstemmed rCASC: reproducible classification analysis of single-cell sequencing data
title_short rCASC: reproducible classification analysis of single-cell sequencing data
title_sort rcasc: reproducible classification analysis of single-cell sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6732171/
https://www.ncbi.nlm.nih.gov/pubmed/31494672
http://dx.doi.org/10.1093/gigascience/giz105
work_keys_str_mv AT alessandriluca rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT corderofrancesca rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT beccutimarco rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT arigonimaddalena rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT oliveromartina rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT romanogreta rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT rabellinosergio rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT licherinicola rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT deliberogennaro rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT paceluigia rcascreproducibleclassificationanalysisofsinglecellsequencingdata
AT calogeroraffaelea rcascreproducibleclassificationanalysisofsinglecellsequencingdata