Cargando…

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-base...

Descripción completa

Detalles Bibliográficos
Autores principales:	Do, Van Hoan, Rojas Ringeling, Francisca, Canzar, Stefan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory Press 2021
Materias:	Method
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015854/ https://www.ncbi.nlm.nih.gov/pubmed/33627473 http://dx.doi.org/10.1101/gr.267906.120

_version_	1783673759992905728
author	Do, Van Hoan Rojas Ringeling, Francisca Canzar, Stefan
author_facet	Do, Van Hoan Rojas Ringeling, Francisca Canzar, Stefan
author_sort	Do, Van Hoan
collection	PubMed
description	A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.
format	Online Article Text
id	pubmed-8015854
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Cold Spring Harbor Laboratory Press
record_format	MEDLINE/PubMed
spelling	pubmed-80158542021-10-01 Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data Do, Van Hoan Rojas Ringeling, Francisca Canzar, Stefan Genome Res Method A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Cold Spring Harbor Laboratory Press 2021-04 /pmc/articles/PMC8015854/ /pubmed/33627473 http://dx.doi.org/10.1101/gr.267906.120 Text en © 2021 Do et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle	Method Do, Van Hoan Rojas Ringeling, Francisca Canzar, Stefan Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title	Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title_full	Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title_fullStr	Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title_full_unstemmed	Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title_short	Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data
title_sort	linear-time cluster ensembles of large-scale single-cell rna-seq and multimodal data
topic	Method
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8015854/ https://www.ncbi.nlm.nih.gov/pubmed/33627473 http://dx.doi.org/10.1101/gr.267906.120
work_keys_str_mv	AT dovanhoan lineartimeclusterensemblesoflargescalesinglecellrnaseqandmultimodaldata AT rojasringelingfrancisca lineartimeclusterensemblesoflargescalesinglecellrnaseqandmultimodaldata AT canzarstefan lineartimeclusterensemblesoflargescalesinglecellrnaseqandmultimodaldata

Linear-time cluster ensembles of large-scale single-cell RNA-seq and multimodal data

Ejemplares similares