Cargando…

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

BACKGROUND: The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering a...

Descripción completa

Detalles Bibliográficos
Autores principales: DiMaggio, Peter A, McAllister, Scott R, Floudas, Christodoulos A, Feng, Xiao-Jiang, Rabinowitz, Joshua D, Rabitz, Herschel A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605474/
https://www.ncbi.nlm.nih.gov/pubmed/18954459
http://dx.doi.org/10.1186/1471-2105-9-458
_version_ 1782162858498326528
author DiMaggio, Peter A
McAllister, Scott R
Floudas, Christodoulos A
Feng, Xiao-Jiang
Rabinowitz, Joshua D
Rabitz, Herschel A
author_facet DiMaggio, Peter A
McAllister, Scott R
Floudas, Christodoulos A
Feng, Xiao-Jiang
Rabinowitz, Joshua D
Rabitz, Herschel A
author_sort DiMaggio, Peter A
collection PubMed
description BACKGROUND: The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters. RESULTS: In this article, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. Cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices to generate biclusters. The performance of OREO is tested on (a) metabolite concentration data, (b) an image reconstruction matrix, (c) synthetic data with implanted biclusters, and gene expression data for (d) colon cancer data, (e) breast cancer data, as well as (f) yeast segregant data to validate the ability of the proposed method and compare it to existing biclustering and clustering methods. CONCLUSION: We demonstrate that this rigorous global optimization method for biclustering produces clusters with more insightful groupings of similar entities, such as genes or metabolites sharing common functions, than other clustering and biclustering algorithms and can reconstruct underlying fundamental patterns in the data for several distinct sets of data matrices arising in important biological applications.
format Text
id pubmed-2605474
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26054742008-12-19 Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies DiMaggio, Peter A McAllister, Scott R Floudas, Christodoulos A Feng, Xiao-Jiang Rabinowitz, Joshua D Rabitz, Herschel A BMC Bioinformatics Research Article BACKGROUND: The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters. RESULTS: In this article, we present a rigorous approach to biclustering, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix so as to globally minimize the dissimilarity metric. The physical permutations of the rows and columns of the data matrix can be modeled as either a network flow problem or a traveling salesman problem. Cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices to generate biclusters. The performance of OREO is tested on (a) metabolite concentration data, (b) an image reconstruction matrix, (c) synthetic data with implanted biclusters, and gene expression data for (d) colon cancer data, (e) breast cancer data, as well as (f) yeast segregant data to validate the ability of the proposed method and compare it to existing biclustering and clustering methods. CONCLUSION: We demonstrate that this rigorous global optimization method for biclustering produces clusters with more insightful groupings of similar entities, such as genes or metabolites sharing common functions, than other clustering and biclustering algorithms and can reconstruct underlying fundamental patterns in the data for several distinct sets of data matrices arising in important biological applications. BioMed Central 2008-10-27 /pmc/articles/PMC2605474/ /pubmed/18954459 http://dx.doi.org/10.1186/1471-2105-9-458 Text en Copyright © 2008 DiMaggio et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
DiMaggio, Peter A
McAllister, Scott R
Floudas, Christodoulos A
Feng, Xiao-Jiang
Rabinowitz, Joshua D
Rabitz, Herschel A
Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title_full Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title_fullStr Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title_full_unstemmed Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title_short Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
title_sort biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2605474/
https://www.ncbi.nlm.nih.gov/pubmed/18954459
http://dx.doi.org/10.1186/1471-2105-9-458
work_keys_str_mv AT dimaggiopetera biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies
AT mcallisterscottr biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies
AT floudaschristodoulosa biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies
AT fengxiaojiang biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies
AT rabinowitzjoshuad biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies
AT rabitzherschela biclusteringviaoptimalreorderingofdatamatricesinsystemsbiologyrigorousmethodsandcomparativestudies