Cargando…

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data

Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsamplin...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Xianwen, Zheng, Liangtao, Zhang, Zemin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624216/
https://www.ncbi.nlm.nih.gov/pubmed/31202000
http://dx.doi.org/10.1016/j.gpb.2018.10.003
_version_ 1783434224781492224
author Ren, Xianwen
Zheng, Liangtao
Zhang, Zemin
author_facet Ren, Xianwen
Zheng, Liangtao
Zhang, Zemin
author_sort Ren, Xianwen
collection PubMed
description Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.
format Online
Article
Text
id pubmed-6624216
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-66242162019-07-22 SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data Ren, Xianwen Zheng, Liangtao Zhang, Zemin Genomics Proteomics Bioinformatics Method Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust. Elsevier 2019-04 2019-06-13 /pmc/articles/PMC6624216/ /pubmed/31202000 http://dx.doi.org/10.1016/j.gpb.2018.10.003 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Ren, Xianwen
Zheng, Liangtao
Zhang, Zemin
SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title_full SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title_fullStr SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title_full_unstemmed SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title_short SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
title_sort sscc: a novel computational framework for rapid and accurate clustering large-scale single cell rna-seq data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624216/
https://www.ncbi.nlm.nih.gov/pubmed/31202000
http://dx.doi.org/10.1016/j.gpb.2018.10.003
work_keys_str_mv AT renxianwen ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata
AT zhengliangtao ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata
AT zhangzemin ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata