Cargando…
EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The g...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673820/ https://www.ncbi.nlm.nih.gov/pubmed/33329710 http://dx.doi.org/10.3389/fgene.2020.572242 |
_version_ | 1783611395803185152 |
---|---|
author | Zhu, Yuan Zhang, De-Xin Zhang, Xiao-Fei Yi, Ming Ou-Yang, Le Wu, Mengyun |
author_facet | Zhu, Yuan Zhang, De-Xin Zhang, Xiao-Fei Yi, Ming Ou-Yang, Le Wu, Mengyun |
author_sort | Zhu, Yuan |
collection | PubMed |
description | Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The general methods can be roughly divided into normal clustering methods and integrated (ensemble) clustering methods which combine more than two normal clustering methods aiming to get much more informative performance. In order to make a contrast with the integrated clustering algorithm, the normal clustering method is often called individual or base clustering method. Note that the results of many individual clustering methods are often developed to capture one aspect of the data, and the results depend on the initial parameter settings, such as cluster number, distance metric and so on. Compared with individual clustering, although integrative clustering method may get much more accurate performance, the results depend on the base clustering results and integrated systems are often not self-regulation. Therefore, how to design a robust unsupervised clustering method is still a challenge. In order to tackle above limitations, we propose a novel Ensemble Clustering algorithm based on Probability Graphical Model with Graph Regularization, which is called EC-PGMGR for short. On one hand, we use parameter controlling in Probability Graphical Model (PGM) to automatically determine the cluster number without prior knowledge. On the other hand, we add a regularization term to reduce the effect deriving from some weak base clustering results. Particularly, the integrative results collected from base clustering methods can be assembled in the form of combination with self-regulation weights through a pre-learning process, which can efficiently enhance the effect of active clustering methods while weaken the effect of inactive clustering methods. Experiments are carried out on 7 data sets generated by different platforms with the number of single cells from 822 to 5,132. Results show that EC-PGMGR performs better than 4 alternative individual clustering methods and 2 ensemble methods in terms of accuracy including Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), robustness, effectiveness and so on. EC-PGMGR provides an effective way to integrate different clustering results for more accurate and reliable results in further biological analysis as well. It may provide some new insights to the other applications of clustering. |
format | Online Article Text |
id | pubmed-7673820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-76738202020-12-15 EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data Zhu, Yuan Zhang, De-Xin Zhang, Xiao-Fei Yi, Ming Ou-Yang, Le Wu, Mengyun Front Genet Genetics Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The general methods can be roughly divided into normal clustering methods and integrated (ensemble) clustering methods which combine more than two normal clustering methods aiming to get much more informative performance. In order to make a contrast with the integrated clustering algorithm, the normal clustering method is often called individual or base clustering method. Note that the results of many individual clustering methods are often developed to capture one aspect of the data, and the results depend on the initial parameter settings, such as cluster number, distance metric and so on. Compared with individual clustering, although integrative clustering method may get much more accurate performance, the results depend on the base clustering results and integrated systems are often not self-regulation. Therefore, how to design a robust unsupervised clustering method is still a challenge. In order to tackle above limitations, we propose a novel Ensemble Clustering algorithm based on Probability Graphical Model with Graph Regularization, which is called EC-PGMGR for short. On one hand, we use parameter controlling in Probability Graphical Model (PGM) to automatically determine the cluster number without prior knowledge. On the other hand, we add a regularization term to reduce the effect deriving from some weak base clustering results. Particularly, the integrative results collected from base clustering methods can be assembled in the form of combination with self-regulation weights through a pre-learning process, which can efficiently enhance the effect of active clustering methods while weaken the effect of inactive clustering methods. Experiments are carried out on 7 data sets generated by different platforms with the number of single cells from 822 to 5,132. Results show that EC-PGMGR performs better than 4 alternative individual clustering methods and 2 ensemble methods in terms of accuracy including Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), robustness, effectiveness and so on. EC-PGMGR provides an effective way to integrate different clustering results for more accurate and reliable results in further biological analysis as well. It may provide some new insights to the other applications of clustering. Frontiers Media S.A. 2020-11-04 /pmc/articles/PMC7673820/ /pubmed/33329710 http://dx.doi.org/10.3389/fgene.2020.572242 Text en Copyright © 2020 Zhu, Zhang, Zhang, Yi, Ou-Yang and Wu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Zhu, Yuan Zhang, De-Xin Zhang, Xiao-Fei Yi, Ming Ou-Yang, Le Wu, Mengyun EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title | EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title_full | EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title_fullStr | EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title_full_unstemmed | EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title_short | EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data |
title_sort | ec-pgmgr: ensemble clustering based on probability graphical model with graph regularization for single-cell rna-seq data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673820/ https://www.ncbi.nlm.nih.gov/pubmed/33329710 http://dx.doi.org/10.3389/fgene.2020.572242 |
work_keys_str_mv | AT zhuyuan ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata AT zhangdexin ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata AT zhangxiaofei ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata AT yiming ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata AT ouyangle ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata AT wumengyun ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata |