Cargando…

EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data

Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The g...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Yuan, Zhang, De-Xin, Zhang, Xiao-Fei, Yi, Ming, Ou-Yang, Le, Wu, Mengyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673820/
https://www.ncbi.nlm.nih.gov/pubmed/33329710
http://dx.doi.org/10.3389/fgene.2020.572242
_version_ 1783611395803185152
author Zhu, Yuan
Zhang, De-Xin
Zhang, Xiao-Fei
Yi, Ming
Ou-Yang, Le
Wu, Mengyun
author_facet Zhu, Yuan
Zhang, De-Xin
Zhang, Xiao-Fei
Yi, Ming
Ou-Yang, Le
Wu, Mengyun
author_sort Zhu, Yuan
collection PubMed
description Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The general methods can be roughly divided into normal clustering methods and integrated (ensemble) clustering methods which combine more than two normal clustering methods aiming to get much more informative performance. In order to make a contrast with the integrated clustering algorithm, the normal clustering method is often called individual or base clustering method. Note that the results of many individual clustering methods are often developed to capture one aspect of the data, and the results depend on the initial parameter settings, such as cluster number, distance metric and so on. Compared with individual clustering, although integrative clustering method may get much more accurate performance, the results depend on the base clustering results and integrated systems are often not self-regulation. Therefore, how to design a robust unsupervised clustering method is still a challenge. In order to tackle above limitations, we propose a novel Ensemble Clustering algorithm based on Probability Graphical Model with Graph Regularization, which is called EC-PGMGR for short. On one hand, we use parameter controlling in Probability Graphical Model (PGM) to automatically determine the cluster number without prior knowledge. On the other hand, we add a regularization term to reduce the effect deriving from some weak base clustering results. Particularly, the integrative results collected from base clustering methods can be assembled in the form of combination with self-regulation weights through a pre-learning process, which can efficiently enhance the effect of active clustering methods while weaken the effect of inactive clustering methods. Experiments are carried out on 7 data sets generated by different platforms with the number of single cells from 822 to 5,132. Results show that EC-PGMGR performs better than 4 alternative individual clustering methods and 2 ensemble methods in terms of accuracy including Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), robustness, effectiveness and so on. EC-PGMGR provides an effective way to integrate different clustering results for more accurate and reliable results in further biological analysis as well. It may provide some new insights to the other applications of clustering.
format Online
Article
Text
id pubmed-7673820
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76738202020-12-15 EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data Zhu, Yuan Zhang, De-Xin Zhang, Xiao-Fei Yi, Ming Ou-Yang, Le Wu, Mengyun Front Genet Genetics Advances in technology have made it convenient to obtain a large amount of single cell RNA sequencing (scRNA-seq) data. Since that clustering is a very important step in identifying or defining cellular phenotypes, many clustering approaches have been developed recently for these applications. The general methods can be roughly divided into normal clustering methods and integrated (ensemble) clustering methods which combine more than two normal clustering methods aiming to get much more informative performance. In order to make a contrast with the integrated clustering algorithm, the normal clustering method is often called individual or base clustering method. Note that the results of many individual clustering methods are often developed to capture one aspect of the data, and the results depend on the initial parameter settings, such as cluster number, distance metric and so on. Compared with individual clustering, although integrative clustering method may get much more accurate performance, the results depend on the base clustering results and integrated systems are often not self-regulation. Therefore, how to design a robust unsupervised clustering method is still a challenge. In order to tackle above limitations, we propose a novel Ensemble Clustering algorithm based on Probability Graphical Model with Graph Regularization, which is called EC-PGMGR for short. On one hand, we use parameter controlling in Probability Graphical Model (PGM) to automatically determine the cluster number without prior knowledge. On the other hand, we add a regularization term to reduce the effect deriving from some weak base clustering results. Particularly, the integrative results collected from base clustering methods can be assembled in the form of combination with self-regulation weights through a pre-learning process, which can efficiently enhance the effect of active clustering methods while weaken the effect of inactive clustering methods. Experiments are carried out on 7 data sets generated by different platforms with the number of single cells from 822 to 5,132. Results show that EC-PGMGR performs better than 4 alternative individual clustering methods and 2 ensemble methods in terms of accuracy including Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), robustness, effectiveness and so on. EC-PGMGR provides an effective way to integrate different clustering results for more accurate and reliable results in further biological analysis as well. It may provide some new insights to the other applications of clustering. Frontiers Media S.A. 2020-11-04 /pmc/articles/PMC7673820/ /pubmed/33329710 http://dx.doi.org/10.3389/fgene.2020.572242 Text en Copyright © 2020 Zhu, Zhang, Zhang, Yi, Ou-Yang and Wu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhu, Yuan
Zhang, De-Xin
Zhang, Xiao-Fei
Yi, Ming
Ou-Yang, Le
Wu, Mengyun
EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title_full EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title_fullStr EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title_full_unstemmed EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title_short EC-PGMGR: Ensemble Clustering Based on Probability Graphical Model With Graph Regularization for Single-Cell RNA-seq Data
title_sort ec-pgmgr: ensemble clustering based on probability graphical model with graph regularization for single-cell rna-seq data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673820/
https://www.ncbi.nlm.nih.gov/pubmed/33329710
http://dx.doi.org/10.3389/fgene.2020.572242
work_keys_str_mv AT zhuyuan ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata
AT zhangdexin ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata
AT zhangxiaofei ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata
AT yiming ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata
AT ouyangle ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata
AT wumengyun ecpgmgrensembleclusteringbasedonprobabilitygraphicalmodelwithgraphregularizationforsinglecellrnaseqdata