Cargando…

A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data

BACKGROUND: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Li, Guo, Yang, Wu, Wenwu, Shi, Youyi, Cheng, Jian, Tao, Shiheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447720/
https://www.ncbi.nlm.nih.gov/pubmed/22824157
http://dx.doi.org/10.1186/1756-0381-5-8
_version_ 1782244149350629376
author Li, Li
Guo, Yang
Wu, Wenwu
Shi, Youyi
Cheng, Jian
Tao, Shiheng
author_facet Li, Li
Guo, Yang
Wu, Wenwu
Shi, Youyi
Cheng, Jian
Tao, Shiheng
author_sort Li, Li
collection PubMed
description BACKGROUND: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. METHODS: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms’ performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. RESULTS: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset.
format Online
Article
Text
id pubmed-3447720
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34477202012-09-25 A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data Li, Li Guo, Yang Wu, Wenwu Shi, Youyi Cheng, Jian Tao, Shiheng BioData Min Methodology BACKGROUND: Several biclustering algorithms have been proposed to identify biclusters, in which genes share similar expression patterns across a number of conditions. However, different algorithms would yield different biclusters and further lead to distinct conclusions. Therefore, some testing and comparisons between these algorithms are strongly required. METHODS: In this study, five biclustering algorithms (i.e. BIMAX, FABIA, ISA, QUBIC and SAMBA) were compared with each other in the cases where they were used to handle two expression datasets (GDS1620 and pathway) with different dimensions in Arabidopsis thaliana (A. thaliana) GO (gene ontology) annotation and PPI (protein-protein interaction) network were used to verify the corresponding biological significance of biclusters from the five algorithms. To compare the algorithms’ performance and evaluate quality of identified biclusters, two scoring methods, namely weighted enrichment (WE) scoring and PPI scoring, were proposed in our study. For each dataset, after combining the scores of all biclusters into one unified ranking, we could evaluate the performance and behavior of the five biclustering algorithms in a better way. RESULTS: Both WE and PPI scoring methods has been proved effective to validate biological significance of the biclusters, and a significantly positive correlation between the two sets of scores has been tested to demonstrate the consistence of these two methods. A comparative study of the above five algorithms has revealed that: (1) ISA is the most effective one among the five algorithms on the dataset of GDS1620 and BIMAX outperforms the other algorithms on the dataset of pathway. (2) Both ISA and BIMAX are data-dependent. The former one does not work well on the datasets with few genes, while the latter one holds well for the datasets with more conditions. (3) FABIA and QUBIC perform poorly in this study and they may be suitable to large datasets with more genes and more conditions. (4) SAMBA is also data-independent as it performs well on two given datasets. The comparison results provide useful information for researchers to choose a suitable algorithm for each given dataset. BioMed Central 2012-07-23 /pmc/articles/PMC3447720/ /pubmed/22824157 http://dx.doi.org/10.1186/1756-0381-5-8 Text en Copyright ©2012 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Li, Li
Guo, Yang
Wu, Wenwu
Shi, Youyi
Cheng, Jian
Tao, Shiheng
A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title_full A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title_fullStr A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title_full_unstemmed A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title_short A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
title_sort comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447720/
https://www.ncbi.nlm.nih.gov/pubmed/22824157
http://dx.doi.org/10.1186/1756-0381-5-8
work_keys_str_mv AT lili acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT guoyang acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT wuwenwu acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT shiyouyi acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT chengjian acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT taoshiheng acomparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT lili comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT guoyang comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT wuwenwu comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT shiyouyi comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT chengjian comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata
AT taoshiheng comparisonandevaluationoffivebiclusteringalgorithmsbyquantifyinggoodnessofbiclustersforgeneexpressiondata