Cargando…
rCASC: reproducible classification analysis of single-cell sequencing data
BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools hav...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6732171/ https://www.ncbi.nlm.nih.gov/pubmed/31494672 http://dx.doi.org/10.1093/gigascience/giz105 |
_version_ | 1783449778325028864 |
---|---|
author | Alessandrì, Luca Cordero, Francesca Beccuti, Marco Arigoni, Maddalena Olivero, Martina Romano, Greta Rabellino, Sergio Licheri, Nicola De Libero, Gennaro Pace, Luigia Calogero, Raffaele A |
author_facet | Alessandrì, Luca Cordero, Francesca Beccuti, Marco Arigoni, Maddalena Olivero, Martina Romano, Greta Rabellino, Sergio Licheri, Nicola De Libero, Gennaro Pace, Luigia Calogero, Raffaele A |
author_sort | Alessandrì, Luca |
collection | PubMed |
description | BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. FINDINGS: rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. CONCLUSIONS: rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R. |
format | Online Article Text |
id | pubmed-6732171 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-67321712019-09-12 rCASC: reproducible classification analysis of single-cell sequencing data Alessandrì, Luca Cordero, Francesca Beccuti, Marco Arigoni, Maddalena Olivero, Martina Romano, Greta Rabellino, Sergio Licheri, Nicola De Libero, Gennaro Pace, Luigia Calogero, Raffaele A Gigascience Technical Note BACKGROUND: Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. FINDINGS: rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. CONCLUSIONS: rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R. Oxford University Press 2019-09-08 /pmc/articles/PMC6732171/ /pubmed/31494672 http://dx.doi.org/10.1093/gigascience/giz105 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Alessandrì, Luca Cordero, Francesca Beccuti, Marco Arigoni, Maddalena Olivero, Martina Romano, Greta Rabellino, Sergio Licheri, Nicola De Libero, Gennaro Pace, Luigia Calogero, Raffaele A rCASC: reproducible classification analysis of single-cell sequencing data |
title | rCASC: reproducible classification analysis of single-cell sequencing data |
title_full | rCASC: reproducible classification analysis of single-cell sequencing data |
title_fullStr | rCASC: reproducible classification analysis of single-cell sequencing data |
title_full_unstemmed | rCASC: reproducible classification analysis of single-cell sequencing data |
title_short | rCASC: reproducible classification analysis of single-cell sequencing data |
title_sort | rcasc: reproducible classification analysis of single-cell sequencing data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6732171/ https://www.ncbi.nlm.nih.gov/pubmed/31494672 http://dx.doi.org/10.1093/gigascience/giz105 |
work_keys_str_mv | AT alessandriluca rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT corderofrancesca rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT beccutimarco rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT arigonimaddalena rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT oliveromartina rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT romanogreta rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT rabellinosergio rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT licherinicola rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT deliberogennaro rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT paceluigia rcascreproducibleclassificationanalysisofsinglecellsequencingdata AT calogeroraffaelea rcascreproducibleclassificationanalysisofsinglecellsequencingdata |