Cargando…
A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and bio...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356971/ https://www.ncbi.nlm.nih.gov/pubmed/30669418 http://dx.doi.org/10.3390/genes10010066 |
_version_ | 1783391682959507456 |
---|---|
author | Yang, Chao Wang, Yu-Tian Zheng, Chun-Hou |
author_facet | Yang, Chao Wang, Yu-Tian Zheng, Chun-Hou |
author_sort | Yang, Chao |
collection | PubMed |
description | Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance. |
format | Online Article Text |
id | pubmed-6356971 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-63569712019-02-04 A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping Yang, Chao Wang, Yu-Tian Zheng, Chun-Hou Genes (Basel) Article Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance. MDPI 2019-01-18 /pmc/articles/PMC6356971/ /pubmed/30669418 http://dx.doi.org/10.3390/genes10010066 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Yang, Chao Wang, Yu-Tian Zheng, Chun-Hou A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title | A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title_full | A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title_fullStr | A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title_full_unstemmed | A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title_short | A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping |
title_sort | random walk based cluster ensemble approach for data integration and cancer subtyping |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356971/ https://www.ncbi.nlm.nih.gov/pubmed/30669418 http://dx.doi.org/10.3390/genes10010066 |
work_keys_str_mv | AT yangchao arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping AT wangyutian arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping AT zhengchunhou arandomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping AT yangchao randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping AT wangyutian randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping AT zhengchunhou randomwalkbasedclusterensembleapproachfordataintegrationandcancersubtyping |