Cargando…

Unsupervised fuzzy pattern discovery in gene expression data

BACKGROUND: Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Gene PK, Chan, Keith CC, Wong, Andrew KC
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226255/ https://www.ncbi.nlm.nih.gov/pubmed/21989090 http://dx.doi.org/10.1186/1471-2105-12-S5-S5

_version_	1782217587352928256
author	Wu, Gene PK Chan, Keith CC Wong, Andrew KC
author_facet	Wu, Gene PK Chan, Keith CC Wong, Andrew KC
author_sort	Wu, Gene PK
collection	PubMed
description	BACKGROUND: Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. METHODS: For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp gene clusters to overcome such problem. RESULTS: To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm’s effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. CONCLUSIONS: We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary.
format	Online Article Text
id	pubmed-3226255
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32262552011-11-30 Unsupervised fuzzy pattern discovery in gene expression data Wu, Gene PK Chan, Keith CC Wong, Andrew KC BMC Bioinformatics Proceedings BACKGROUND: Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. METHODS: For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp gene clusters to overcome such problem. RESULTS: To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm’s effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. CONCLUSIONS: We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary. BioMed Central 2011-07-27 /pmc/articles/PMC3226255/ /pubmed/21989090 http://dx.doi.org/10.1186/1471-2105-12-S5-S5 Text en Copyright ©2011 Wu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wu, Gene PK Chan, Keith CC Wong, Andrew KC Unsupervised fuzzy pattern discovery in gene expression data
title	Unsupervised fuzzy pattern discovery in gene expression data
title_full	Unsupervised fuzzy pattern discovery in gene expression data
title_fullStr	Unsupervised fuzzy pattern discovery in gene expression data
title_full_unstemmed	Unsupervised fuzzy pattern discovery in gene expression data
title_short	Unsupervised fuzzy pattern discovery in gene expression data
title_sort	unsupervised fuzzy pattern discovery in gene expression data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226255/ https://www.ncbi.nlm.nih.gov/pubmed/21989090 http://dx.doi.org/10.1186/1471-2105-12-S5-S5
work_keys_str_mv	AT wugenepk unsupervisedfuzzypatterndiscoveryingeneexpressiondata AT chankeithcc unsupervisedfuzzypatterndiscoveryingeneexpressiondata AT wongandrewkc unsupervisedfuzzypatterndiscoveryingeneexpressiondata

Unsupervised fuzzy pattern discovery in gene expression data

Ejemplares similares