Cargando…

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can grou...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Chao, Liu, Shufen, Zhang, Hao, Guan, Renchu, Li, Dan, Zhou, Fengfeng, Liang, Yanchun, Feng, Xiaoyue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139673/
https://www.ncbi.nlm.nih.gov/pubmed/32235704
http://dx.doi.org/10.3390/ijms21062181
_version_ 1783518820318576640
author Feng, Chao
Liu, Shufen
Zhang, Hao
Guan, Renchu
Li, Dan
Zhou, Fengfeng
Liang, Yanchun
Feng, Xiaoyue
author_facet Feng, Chao
Liu, Shufen
Zhang, Hao
Guan, Renchu
Li, Dan
Zhou, Fengfeng
Liang, Yanchun
Feng, Xiaoyue
author_sort Feng, Chao
collection PubMed
description With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.
format Online
Article
Text
id pubmed-7139673
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-71396732020-04-10 Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study Feng, Chao Liu, Shufen Zhang, Hao Guan, Renchu Li, Dan Zhou, Fengfeng Liang, Yanchun Feng, Xiaoyue Int J Mol Sci Article With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results. MDPI 2020-03-22 /pmc/articles/PMC7139673/ /pubmed/32235704 http://dx.doi.org/10.3390/ijms21062181 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Feng, Chao
Liu, Shufen
Zhang, Hao
Guan, Renchu
Li, Dan
Zhou, Fengfeng
Liang, Yanchun
Feng, Xiaoyue
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title_full Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title_fullStr Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title_full_unstemmed Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title_short Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study
title_sort dimension reduction and clustering models for single-cell rna sequencing data: a comparative study
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139673/
https://www.ncbi.nlm.nih.gov/pubmed/32235704
http://dx.doi.org/10.3390/ijms21062181
work_keys_str_mv AT fengchao dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT liushufen dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT zhanghao dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT guanrenchu dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT lidan dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT zhoufengfeng dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT liangyanchun dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy
AT fengxiaoyue dimensionreductionandclusteringmodelsforsinglecellrnasequencingdataacomparativestudy