Cargando…

On an ensemble algorithm for clustering cancer patient data

BACKGROUND: The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Ran, Wu, Dengyuan, Sheng, Li, Henson, Donald, Schwartz, Arnold, Xu, Eric, Xing, Kai, Chen, Dechang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854654/
https://www.ncbi.nlm.nih.gov/pubmed/24565417
http://dx.doi.org/10.1186/1752-0509-7-S4-S9
_version_ 1782294840826920960
author Qi, Ran
Wu, Dengyuan
Sheng, Li
Henson, Donald
Schwartz, Arnold
Xu, Eric
Xing, Kai
Chen, Dechang
author_facet Qi, Ran
Wu, Dengyuan
Sheng, Li
Henson, Donald
Schwartz, Arnold
Xu, Eric
Xing, Kai
Chen, Dechang
author_sort Qi, Ran
collection PubMed
description BACKGROUND: The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches. RESULTS: Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions. The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters. CONCLUSIONS: When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data.
format Online
Article
Text
id pubmed-3854654
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38546542013-12-16 On an ensemble algorithm for clustering cancer patient data Qi, Ran Wu, Dengyuan Sheng, Li Henson, Donald Schwartz, Arnold Xu, Eric Xing, Kai Chen, Dechang BMC Syst Biol Research BACKGROUND: The TNM staging system is based on three anatomic prognostic factors: Tumor, Lymph Node and Metastasis. However, cancer is no longer considered an anatomic disease. Therefore, the TNM should be expanded to accommodate new prognostic factors in order to increase the accuracy of estimating cancer patient outcome. The ensemble algorithm for clustering cancer data (EACCD) by Chen et al. reflects an effort to expand the TNM without changing its basic definitions. Though results on using EACCD have been reported, there has been no study on the analysis of the algorithm. In this report, we examine various aspects of EACCD using a large breast cancer patient dataset. We compared the output of EACCD with the corresponding survival curves, investigated the effect of different settings in EACCD, and compared EACCD with alternative clustering approaches. RESULTS: Using the basic T and N definitions, EACCD generated a dendrogram that shows a graphic relationship among the survival curves of the breast cancer patients. The dendrograms from EACCD are robust for large values of m (the number of runs in the learning step). When m is large, the dendrograms depend on the linkage functions. The statistical tests, however, employed in the learning step have minimal effect on the dendrogram for large m. In addition, if omitting the step for learning dissimilarity in EACCD, the resulting approaches can have a degraded performance. Furthermore, clustering only based on prognostic factors could generate misleading dendrograms, and direct use of partitioning techniques could lead to misleading assignments to clusters. CONCLUSIONS: When only the Partitioning Around Medoids (PAM) algorithm is involved in the step of learning dissimilarity, large values of m are required to obtain robust dendrograms, and for a large m EACCD can effectively cluster cancer patient data. BioMed Central 2013-10-23 /pmc/articles/PMC3854654/ /pubmed/24565417 http://dx.doi.org/10.1186/1752-0509-7-S4-S9 Text en Copyright © 2013 Qi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Qi, Ran
Wu, Dengyuan
Sheng, Li
Henson, Donald
Schwartz, Arnold
Xu, Eric
Xing, Kai
Chen, Dechang
On an ensemble algorithm for clustering cancer patient data
title On an ensemble algorithm for clustering cancer patient data
title_full On an ensemble algorithm for clustering cancer patient data
title_fullStr On an ensemble algorithm for clustering cancer patient data
title_full_unstemmed On an ensemble algorithm for clustering cancer patient data
title_short On an ensemble algorithm for clustering cancer patient data
title_sort on an ensemble algorithm for clustering cancer patient data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3854654/
https://www.ncbi.nlm.nih.gov/pubmed/24565417
http://dx.doi.org/10.1186/1752-0509-7-S4-S9
work_keys_str_mv AT qiran onanensemblealgorithmforclusteringcancerpatientdata
AT wudengyuan onanensemblealgorithmforclusteringcancerpatientdata
AT shengli onanensemblealgorithmforclusteringcancerpatientdata
AT hensondonald onanensemblealgorithmforclusteringcancerpatientdata
AT schwartzarnold onanensemblealgorithmforclusteringcancerpatientdata
AT xueric onanensemblealgorithmforclusteringcancerpatientdata
AT xingkai onanensemblealgorithmforclusteringcancerpatientdata
AT chendechang onanensemblealgorithmforclusteringcancerpatientdata