Cargando…

Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization

BACKGROUND: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Kin-On, Law, Ngai-Fong, Siu, Wan-Chi, Liew, Alan Wee-Chung
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2396181/
https://www.ncbi.nlm.nih.gov/pubmed/18433478
http://dx.doi.org/10.1186/1471-2105-9-210
_version_ 1782155545508052992
author Cheng, Kin-On
Law, Ngai-Fong
Siu, Wan-Chi
Liew, Alan Wee-Chung
author_facet Cheng, Kin-On
Law, Ngai-Fong
Siu, Wan-Chi
Liew, Alan Wee-Chung
author_sort Cheng, Kin-On
collection PubMed
description BACKGROUND: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. RESULTS: We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. CONCLUSION: We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.
format Text
id pubmed-2396181
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23961812008-05-28 Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization Cheng, Kin-On Law, Ngai-Fong Siu, Wan-Chi Liew, Alan Wee-Chung BMC Bioinformatics Methodology Article BACKGROUND: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. RESULTS: We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. CONCLUSION: We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model. BioMed Central 2008-04-23 /pmc/articles/PMC2396181/ /pubmed/18433478 http://dx.doi.org/10.1186/1471-2105-9-210 Text en Copyright © 2008 Cheng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cheng, Kin-On
Law, Ngai-Fong
Siu, Wan-Chi
Liew, Alan Wee-Chung
Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title_full Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title_fullStr Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title_full_unstemmed Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title_short Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
title_sort identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2396181/
https://www.ncbi.nlm.nih.gov/pubmed/18433478
http://dx.doi.org/10.1186/1471-2105-9-210
work_keys_str_mv AT chengkinon identificationofcoherentpatternsingeneexpressiondatausinganefficientbiclusteringalgorithmandparallelcoordinatevisualization
AT lawngaifong identificationofcoherentpatternsingeneexpressiondatausinganefficientbiclusteringalgorithmandparallelcoordinatevisualization
AT siuwanchi identificationofcoherentpatternsingeneexpressiondatausinganefficientbiclusteringalgorithmandparallelcoordinatevisualization
AT liewalanweechung identificationofcoherentpatternsingeneexpressiondatausinganefficientbiclusteringalgorithmandparallelcoordinatevisualization