Cargando…

Effect of data normalization on fuzzy clustering of DNA microarray data

BACKGROUND: Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Seo Young, Lee, Jae Won, Bae, Jong Sung
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1431564/ https://www.ncbi.nlm.nih.gov/pubmed/16533412 http://dx.doi.org/10.1186/1471-2105-7-134

_version_	1782127203997188096
author	Kim, Seo Young Lee, Jae Won Bae, Jong Sung
author_facet	Kim, Seo Young Lee, Jae Won Bae, Jong Sung
author_sort	Kim, Seo Young
collection	PubMed
description	BACKGROUND: Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap. RESULTS: In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples. CONCLUSION: Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods when samples have varying expression patterns or are noisy. In particular, the FCM method slightly outperformed the hard clustering methods when the expression patterns of genes overlapped and was advantageous in finding co-regulated genes. Thus, the FCM approach offers a convenient method for finding subsets of genes that are strongly associated to a given cluster.
format	Text
id	pubmed-1431564
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14315642006-04-21 Effect of data normalization on fuzzy clustering of DNA microarray data Kim, Seo Young Lee, Jae Won Bae, Jong Sung BMC Bioinformatics Research Article BACKGROUND: Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap. RESULTS: In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples. CONCLUSION: Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods when samples have varying expression patterns or are noisy. In particular, the FCM method slightly outperformed the hard clustering methods when the expression patterns of genes overlapped and was advantageous in finding co-regulated genes. Thus, the FCM approach offers a convenient method for finding subsets of genes that are strongly associated to a given cluster. BioMed Central 2006-03-14 /pmc/articles/PMC1431564/ /pubmed/16533412 http://dx.doi.org/10.1186/1471-2105-7-134 Text en Copyright © 2006 Kim et al; licensee BioMed Central Ltd.
spellingShingle	Research Article Kim, Seo Young Lee, Jae Won Bae, Jong Sung Effect of data normalization on fuzzy clustering of DNA microarray data
title	Effect of data normalization on fuzzy clustering of DNA microarray data
title_full	Effect of data normalization on fuzzy clustering of DNA microarray data
title_fullStr	Effect of data normalization on fuzzy clustering of DNA microarray data
title_full_unstemmed	Effect of data normalization on fuzzy clustering of DNA microarray data
title_short	Effect of data normalization on fuzzy clustering of DNA microarray data
title_sort	effect of data normalization on fuzzy clustering of dna microarray data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1431564/ https://www.ncbi.nlm.nih.gov/pubmed/16533412 http://dx.doi.org/10.1186/1471-2105-7-134
work_keys_str_mv	AT kimseoyoung effectofdatanormalizationonfuzzyclusteringofdnamicroarraydata AT leejaewon effectofdatanormalizationonfuzzyclusteringofdnamicroarraydata AT baejongsung effectofdatanormalizationonfuzzyclusteringofdnamicroarraydata

Effect of data normalization on fuzzy clustering of DNA microarray data

Ejemplares similares