Cargando…
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can grou...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139673/ https://www.ncbi.nlm.nih.gov/pubmed/32235704 http://dx.doi.org/10.3390/ijms21062181 |
_version_ | 1783518820318576640 |
---|---|
author | Feng, Chao Liu, Shufen Zhang, Hao Guan, Renchu Li, Dan Zhou, Fengfeng Liang, Yanchun Feng, Xiaoyue |
author_facet | Feng, Chao Liu, Shufen Zhang, Hao Guan, Renchu Li, Dan Zhou, Fengfeng Liang, Yanchun Feng, Xiaoyue |
author_sort | Feng, Chao |
collection | PubMed |
description | With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results. |
format | Online Article Text |
id | pubmed-7139673 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-71396732020-04-10 Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study Feng, Chao Liu, Shufen Zhang, Hao Guan, Renchu Li, Dan Zhou, Fengfeng Liang, Yanchun Feng, Xiaoyue Int J Mol Sci Article With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results. MDPI 2020-03-22 /pmc/articles/PMC7139673/ /pubmed/32235704 http://dx.doi.org/10.3390/ijms21062181 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Feng, Chao Liu, Shufen Zhang, Hao Guan, Renchu Li, Dan Zhou, Fengfeng Liang, Yanchun Feng, Xiaoyue Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title | Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title_full | Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title_fullStr | Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title_full_unstemmed | Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title_short | Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study |
title_sort | dimension reduction and clustering models for single-cell rna sequencing data: a comparative study |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139673/ https://www.ncbi.nlm.nih.gov/pubmed/32235704 http://dx.doi.org/10.3390/ijms21062181 |
work_keys_str_mv | AT fengchao dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT liushufen dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT zhanghao dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT guanrenchu dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT lidan dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT zhoufengfeng dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT liangyanchun dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy AT fengxiaoyue dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy |