Cargando…

Incremental genetic K-means algorithm and its application in gene expression data analysis

BACKGROUND: In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lu, Yi, Lu, Shiyong, Fotouhi, Farshad, Deng, Youping, Brown, Susan J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2004
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC543472/ https://www.ncbi.nlm.nih.gov/pubmed/15511294 http://dx.doi.org/10.1186/1471-2105-5-172

_version_	1782122129573019648
author	Lu, Yi Lu, Shiyong Fotouhi, Farshad Deng, Youping Brown, Susan J
author_facet	Lu, Yi Lu, Shiyong Fotouhi, Farshad Deng, Youping Brown, Susan J
author_sort	Lu, Yi
collection	PubMed
description	BACKGROUND: In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. RESULTS: In this paper, we propose a new clustering algorithm, Incremental Genetic K-means Algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at CONCLUSIONS: Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster.
format	Text
id	pubmed-543472
institution	National Center for Biotechnology Information
language	English
publishDate	2004
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-5434722005-01-08 Incremental genetic K-means algorithm and its application in gene expression data analysis Lu, Yi Lu, Shiyong Fotouhi, Farshad Deng, Youping Brown, Susan J BMC Bioinformatics Methodology Article BACKGROUND: In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. RESULTS: In this paper, we propose a new clustering algorithm, Incremental Genetic K-means Algorithm (IGKA). IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (FGKA). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at CONCLUSIONS: Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster. BioMed Central 2004-10-28 /pmc/articles/PMC543472/ /pubmed/15511294 http://dx.doi.org/10.1186/1471-2105-5-172 Text en Copyright © 2004 Lu et al; licensee BioMed Central Ltd.
spellingShingle	Methodology Article Lu, Yi Lu, Shiyong Fotouhi, Farshad Deng, Youping Brown, Susan J Incremental genetic K-means algorithm and its application in gene expression data analysis
title	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_full	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_fullStr	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_full_unstemmed	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_short	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_sort	incremental genetic k-means algorithm and its application in gene expression data analysis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC543472/ https://www.ncbi.nlm.nih.gov/pubmed/15511294 http://dx.doi.org/10.1186/1471-2105-5-172
work_keys_str_mv	AT luyi incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT lushiyong incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT fotouhifarshad incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT dengyouping incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT brownsusanj incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis

Incremental genetic K-means algorithm and its application in gene expression data analysis

Ejemplares similares