Cargando…

A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping

Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and bio...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Chao, Wang, Yu-Tian, Zheng, Chun-Hou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356971/
https://www.ncbi.nlm.nih.gov/pubmed/30669418
http://dx.doi.org/10.3390/genes10010066
_version_ 1783391682959507456
author Yang, Chao
Wang, Yu-Tian
Zheng, Chun-Hou
author_facet Yang, Chao
Wang, Yu-Tian
Zheng, Chun-Hou
author_sort Yang, Chao
collection PubMed
description Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance.
format Online
Article
Text
id pubmed-6356971
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-63569712019-02-04 A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping Yang, Chao Wang, Yu-Tian Zheng, Chun-Hou Genes (Basel) Article Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance. MDPI 2019-01-18 /pmc/articles/PMC6356971/ /pubmed/30669418 http://dx.doi.org/10.3390/genes10010066 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Chao
Wang, Yu-Tian
Zheng, Chun-Hou
A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title_full A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title_fullStr A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title_full_unstemmed A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title_short A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
title_sort random walk based cluster ensemble approach for data integration and cancer subtyping
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356971/
https://www.ncbi.nlm.nih.gov/pubmed/30669418
http://dx.doi.org/10.3390/genes10010066
work_keys_str_mv AT yangchao arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping
AT wangyutian arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping
AT zhengchunhou arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping
AT yangchao randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping
AT wangyutian randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping
AT zhengchunhou randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping