Cargando…

MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Eun-Youn, Kim, Seon-Young, Ashlock, Daniel, Nam, Dougu
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743671/ https://www.ncbi.nlm.nih.gov/pubmed/19698124 http://dx.doi.org/10.1186/1471-2105-10-260

_version_	1782171873583300608
author	Kim, Eun-Youn Kim, Seon-Young Ashlock, Daniel Nam, Dougu
author_facet	Kim, Eun-Youn Kim, Seon-Young Ashlock, Daniel Nam, Dougu
author_sort	Kim, Eun-Youn
collection	PubMed
description	BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. RESULTS: We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. CONCLUSION: The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.
format	Text
id	pubmed-2743671
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27436712009-09-15 MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering Kim, Eun-Youn Kim, Seon-Young Ashlock, Daniel Nam, Dougu BMC Bioinformatics Methodology Article BACKGROUND: Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. RESULTS: We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. CONCLUSION: The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. BioMed Central 2009-08-22 /pmc/articles/PMC2743671/ /pubmed/19698124 http://dx.doi.org/10.1186/1471-2105-10-260 Text en Copyright © 2009 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Kim, Eun-Youn Kim, Seon-Young Ashlock, Daniel Nam, Dougu MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title	MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title_full	MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title_fullStr	MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title_full_unstemmed	MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title_short	MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
title_sort	multi-k: accurate classification of microarray subtypes using ensemble k-means clustering
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743671/ https://www.ncbi.nlm.nih.gov/pubmed/19698124 http://dx.doi.org/10.1186/1471-2105-10-260
work_keys_str_mv	AT kimeunyoun multikaccurateclassificationofmicroarraysubtypesusingensemblekmeansclustering AT kimseonyoung multikaccurateclassificationofmicroarraysubtypesusingensemblekmeansclustering AT ashlockdaniel multikaccurateclassificationofmicroarraysubtypesusingensemblekmeansclustering AT namdougu multikaccurateclassificationofmicroarraysubtypesusingensemblekmeansclustering

MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

Ejemplares similares