Cargando…

A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping

Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and bio...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Chao, Wang, Yu-Tian, Zheng, Chun-Hou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356971/
https://www.ncbi.nlm.nih.gov/pubmed/30669418
http://dx.doi.org/10.3390/genes10010066
Descripción
Sumario:Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance.