Cargando…

Evaluation of clustering algorithms for gene expression data

BACKGROUND: Cluster analysis is an integral part of high dimensional data analysis. In the context of large scale gene expression data, a filtered set of genes are grouped together according to their expression profiles using one of numerous clustering algorithms that exist in the statistics and mac...

Descripción completa

Detalles Bibliográficos
Autores principales:	Datta, Susmita, Datta, Somnath
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780133/ https://www.ncbi.nlm.nih.gov/pubmed/17217509 http://dx.doi.org/10.1186/1471-2105-7-S4-S17

_version_	1782131852747735040
author	Datta, Susmita Datta, Somnath
author_facet	Datta, Susmita Datta, Somnath
author_sort	Datta, Susmita
collection	PubMed
description	BACKGROUND: Cluster analysis is an integral part of high dimensional data analysis. In the context of large scale gene expression data, a filtered set of genes are grouped together according to their expression profiles using one of numerous clustering algorithms that exist in the statistics and machine learning literature. A closely related problem is that of selecting a clustering algorithm that is "optimal" in some sense from a rather impressive list of clustering algorithms that currently exist. RESULTS: In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional congruence. Smaller values of these indices indicate better performance for a clustering algorithm. We illustrate this approach using two case studies with publicly available gene expression data sets: one involving a SAGE data of breast cancer patients and the other involving a time course cDNA microarray data on yeast. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. CONCLUSION: No single clustering algorithm may be best suited for clustering genes into functional groups via expression profiles for all data sets. The validation measures introduced in this paper can aid in the selection of an optimal algorithm, for a given data set, from a collection of available clustering algorithms.
format	Text
id	pubmed-1780133
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17801332007-01-24 Evaluation of clustering algorithms for gene expression data Datta, Susmita Datta, Somnath BMC Bioinformatics Research BACKGROUND: Cluster analysis is an integral part of high dimensional data analysis. In the context of large scale gene expression data, a filtered set of genes are grouped together according to their expression profiles using one of numerous clustering algorithms that exist in the statistics and machine learning literature. A closely related problem is that of selecting a clustering algorithm that is "optimal" in some sense from a rather impressive list of clustering algorithms that currently exist. RESULTS: In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional congruence. Smaller values of these indices indicate better performance for a clustering algorithm. We illustrate this approach using two case studies with publicly available gene expression data sets: one involving a SAGE data of breast cancer patients and the other involving a time course cDNA microarray data on yeast. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. CONCLUSION: No single clustering algorithm may be best suited for clustering genes into functional groups via expression profiles for all data sets. The validation measures introduced in this paper can aid in the selection of an optimal algorithm, for a given data set, from a collection of available clustering algorithms. BioMed Central 2006-12-12 /pmc/articles/PMC1780133/ /pubmed/17217509 http://dx.doi.org/10.1186/1471-2105-7-S4-S17 Text en Copyright © 2006 Datta and Datta; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Datta, Susmita Datta, Somnath Evaluation of clustering algorithms for gene expression data
title	Evaluation of clustering algorithms for gene expression data
title_full	Evaluation of clustering algorithms for gene expression data
title_fullStr	Evaluation of clustering algorithms for gene expression data
title_full_unstemmed	Evaluation of clustering algorithms for gene expression data
title_short	Evaluation of clustering algorithms for gene expression data
title_sort	evaluation of clustering algorithms for gene expression data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1780133/ https://www.ncbi.nlm.nih.gov/pubmed/17217509 http://dx.doi.org/10.1186/1471-2105-7-S4-S17
work_keys_str_mv	AT dattasusmita evaluationofclusteringalgorithmsforgeneexpressiondata AT dattasomnath evaluationofclusteringalgorithmsforgeneexpressiondata

Evaluation of clustering algorithms for gene expression data

Ejemplares similares