Cargando…

bigSCale: an analytical framework for big-scale single-cell data

Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing t...

Descripción completa

Detalles Bibliográficos
Autores principales: Iacono, Giovanni, Mereu, Elisabetta, Guillaumet-Adkins, Amy, Corominas, Roser, Cuscó, Ivon, Rodríguez-Esteban, Gustavo, Gut, Marta, Pérez-Jurado, Luis Alberto, Gut, Ivo, Heyn, Holger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991513/
https://www.ncbi.nlm.nih.gov/pubmed/29724792
http://dx.doi.org/10.1101/gr.230771.117
_version_ 1783329840015867904
author Iacono, Giovanni
Mereu, Elisabetta
Guillaumet-Adkins, Amy
Corominas, Roser
Cuscó, Ivon
Rodríguez-Esteban, Gustavo
Gut, Marta
Pérez-Jurado, Luis Alberto
Gut, Ivo
Heyn, Holger
author_facet Iacono, Giovanni
Mereu, Elisabetta
Guillaumet-Adkins, Amy
Corominas, Roser
Cuscó, Ivon
Rodríguez-Esteban, Gustavo
Gut, Marta
Pérez-Jurado, Luis Alberto
Gut, Ivo
Heyn, Holger
author_sort Iacono, Giovanni
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets.
format Online
Article
Text
id pubmed-5991513
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-59915132018-06-18 bigSCale: an analytical framework for big-scale single-cell data Iacono, Giovanni Mereu, Elisabetta Guillaumet-Adkins, Amy Corominas, Roser Cuscó, Ivon Rodríguez-Esteban, Gustavo Gut, Marta Pérez-Jurado, Luis Alberto Gut, Ivo Heyn, Holger Genome Res Method Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin (Reln)-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets. Cold Spring Harbor Laboratory Press 2018-06 /pmc/articles/PMC5991513/ /pubmed/29724792 http://dx.doi.org/10.1101/gr.230771.117 Text en © 2018 Iacono et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Iacono, Giovanni
Mereu, Elisabetta
Guillaumet-Adkins, Amy
Corominas, Roser
Cuscó, Ivon
Rodríguez-Esteban, Gustavo
Gut, Marta
Pérez-Jurado, Luis Alberto
Gut, Ivo
Heyn, Holger
bigSCale: an analytical framework for big-scale single-cell data
title bigSCale: an analytical framework for big-scale single-cell data
title_full bigSCale: an analytical framework for big-scale single-cell data
title_fullStr bigSCale: an analytical framework for big-scale single-cell data
title_full_unstemmed bigSCale: an analytical framework for big-scale single-cell data
title_short bigSCale: an analytical framework for big-scale single-cell data
title_sort bigscale: an analytical framework for big-scale single-cell data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991513/
https://www.ncbi.nlm.nih.gov/pubmed/29724792
http://dx.doi.org/10.1101/gr.230771.117
work_keys_str_mv AT iaconogiovanni bigscaleananalyticalframeworkforbigscalesinglecelldata
AT mereuelisabetta bigscaleananalyticalframeworkforbigscalesinglecelldata
AT guillaumetadkinsamy bigscaleananalyticalframeworkforbigscalesinglecelldata
AT corominasroser bigscaleananalyticalframeworkforbigscalesinglecelldata
AT cuscoivon bigscaleananalyticalframeworkforbigscalesinglecelldata
AT rodriguezestebangustavo bigscaleananalyticalframeworkforbigscalesinglecelldata
AT gutmarta bigscaleananalyticalframeworkforbigscalesinglecelldata
AT perezjuradoluisalberto bigscaleananalyticalframeworkforbigscalesinglecelldata
AT gutivo bigscaleananalyticalframeworkforbigscalesinglecelldata
AT heynholger bigscaleananalyticalframeworkforbigscalesinglecelldata