Cargando…

Discovering biclusters in gene expression data based on high-dimensional linear geometries

BACKGROUND: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a su...

Descripción completa

Detalles Bibliográficos
Autores principales: Gan, Xiangchao, Liew, Alan Wee-Chung, Yan, Hong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386490/
https://www.ncbi.nlm.nih.gov/pubmed/18433477
http://dx.doi.org/10.1186/1471-2105-9-209
_version_ 1782155244166184960
author Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
author_facet Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
author_sort Gan, Xiangchao
collection PubMed
description BACKGROUND: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. RESULTS: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. CONCLUSION: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.
format Text
id pubmed-2386490
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23864902008-05-16 Discovering biclusters in gene expression data based on high-dimensional linear geometries Gan, Xiangchao Liew, Alan Wee-Chung Yan, Hong BMC Bioinformatics Methodology Article BACKGROUND: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. RESULTS: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. CONCLUSION: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis. BioMed Central 2008-04-23 /pmc/articles/PMC2386490/ /pubmed/18433477 http://dx.doi.org/10.1186/1471-2105-9-209 Text en Copyright © 2008 Gan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Gan, Xiangchao
Liew, Alan Wee-Chung
Yan, Hong
Discovering biclusters in gene expression data based on high-dimensional linear geometries
title Discovering biclusters in gene expression data based on high-dimensional linear geometries
title_full Discovering biclusters in gene expression data based on high-dimensional linear geometries
title_fullStr Discovering biclusters in gene expression data based on high-dimensional linear geometries
title_full_unstemmed Discovering biclusters in gene expression data based on high-dimensional linear geometries
title_short Discovering biclusters in gene expression data based on high-dimensional linear geometries
title_sort discovering biclusters in gene expression data based on high-dimensional linear geometries
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386490/
https://www.ncbi.nlm.nih.gov/pubmed/18433477
http://dx.doi.org/10.1186/1471-2105-9-209
work_keys_str_mv AT ganxiangchao discoveringbiclustersingeneexpressiondatabasedonhighdimensionallineargeometries
AT liewalanweechung discoveringbiclustersingeneexpressiondatabasedonhighdimensionallineargeometries
AT yanhong discoveringbiclustersingeneexpressiondatabasedonhighdimensionallineargeometries