Cargando…
SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsamplin...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624216/ https://www.ncbi.nlm.nih.gov/pubmed/31202000 http://dx.doi.org/10.1016/j.gpb.2018.10.003 |
_version_ | 1783434224781492224 |
---|---|
author | Ren, Xianwen Zheng, Liangtao Zhang, Zemin |
author_facet | Ren, Xianwen Zheng, Liangtao Zhang, Zemin |
author_sort | Ren, Xianwen |
collection | PubMed |
description | Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust. |
format | Online Article Text |
id | pubmed-6624216 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-66242162019-07-22 SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data Ren, Xianwen Zheng, Liangtao Zhang, Zemin Genomics Proteomics Bioinformatics Method Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust. Elsevier 2019-04 2019-06-13 /pmc/articles/PMC6624216/ /pubmed/31202000 http://dx.doi.org/10.1016/j.gpb.2018.10.003 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Method Ren, Xianwen Zheng, Liangtao Zhang, Zemin SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title | SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title_full | SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title_fullStr | SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title_full_unstemmed | SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title_short | SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data |
title_sort | sscc: a novel computational framework for rapid and accurate clustering large-scale single cell rna-seq data |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6624216/ https://www.ncbi.nlm.nih.gov/pubmed/31202000 http://dx.doi.org/10.1016/j.gpb.2018.10.003 |
work_keys_str_mv | AT renxianwen ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata AT zhengliangtao ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata AT zhangzemin ssccanovelcomputationalframeworkforrapidandaccurateclusteringlargescalesinglecellrnaseqdata |