Cargando…

Biclustering of microarray data with MOSPO based on crowding distance

BACKGROUND: High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Junwan, Li, Zhoujun, Hu, Xiaohua, Chen, Yiming
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2681067/
https://www.ncbi.nlm.nih.gov/pubmed/19426457
http://dx.doi.org/10.1186/1471-2105-10-S4-S9
_version_ 1782167009431126016
author Liu, Junwan
Li, Zhoujun
Hu, Xiaohua
Chen, Yiming
author_facet Liu, Junwan
Li, Zhoujun
Hu, Xiaohua
Chen, Yiming
author_sort Liu, Junwan
collection PubMed
description BACKGROUND: High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. It leads to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In the medical domain, these patterns are useful for understanding various diseases, and aid in more accurate diagnosis, prognosis, treatment planning, as well as drug discovery. RESULTS: In this work we present the CMOPSOB (Crowding distance based Multi-objective Particle Swarm Optimization Biclustering), a novel clustering approach for microarray datasets to cluster genes and conditions highly related in sub-portions of the microarray data. The objective of biclustering is to find sub-matrices, i.e. maximal subgroups of genes and subgroups of conditions where the genes exhibit highly correlated activities over a subset of conditions. Since these objectives are mutually conflicting, they become suitable candidates for multi-objective modelling. Our approach CMOPSOB is based on a heuristic search technique, multi-objective particle swarm optimization, which simulates the movements of a flock of birds which aim to find food. In the meantime, the nearest neighbour search strategies based on crowding distance and ϵ-dominance can rapidly converge to the Pareto front and guarantee diversity of solutions. We compare the potential of this methodology with other biclustering algorithms by analyzing two common and public datasets of gene expression profiles. In all cases our method can find localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner. CONCLUSION: The proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset. It achieves a good diversity in the obtained Pareto front, and rapid convergence. Therefore, it is a useful tool to analyze large microarray datasets.
format Text
id pubmed-2681067
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26810672009-05-13 Biclustering of microarray data with MOSPO based on crowding distance Liu, Junwan Li, Zhoujun Hu, Xiaohua Chen, Yiming BMC Bioinformatics Proceedings BACKGROUND: High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. It leads to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In the medical domain, these patterns are useful for understanding various diseases, and aid in more accurate diagnosis, prognosis, treatment planning, as well as drug discovery. RESULTS: In this work we present the CMOPSOB (Crowding distance based Multi-objective Particle Swarm Optimization Biclustering), a novel clustering approach for microarray datasets to cluster genes and conditions highly related in sub-portions of the microarray data. The objective of biclustering is to find sub-matrices, i.e. maximal subgroups of genes and subgroups of conditions where the genes exhibit highly correlated activities over a subset of conditions. Since these objectives are mutually conflicting, they become suitable candidates for multi-objective modelling. Our approach CMOPSOB is based on a heuristic search technique, multi-objective particle swarm optimization, which simulates the movements of a flock of birds which aim to find food. In the meantime, the nearest neighbour search strategies based on crowding distance and ϵ-dominance can rapidly converge to the Pareto front and guarantee diversity of solutions. We compare the potential of this methodology with other biclustering algorithms by analyzing two common and public datasets of gene expression profiles. In all cases our method can find localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner. CONCLUSION: The proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset. It achieves a good diversity in the obtained Pareto front, and rapid convergence. Therefore, it is a useful tool to analyze large microarray datasets. BioMed Central 2009-04-29 /pmc/articles/PMC2681067/ /pubmed/19426457 http://dx.doi.org/10.1186/1471-2105-10-S4-S9 Text en Copyright © 2009 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Liu, Junwan
Li, Zhoujun
Hu, Xiaohua
Chen, Yiming
Biclustering of microarray data with MOSPO based on crowding distance
title Biclustering of microarray data with MOSPO based on crowding distance
title_full Biclustering of microarray data with MOSPO based on crowding distance
title_fullStr Biclustering of microarray data with MOSPO based on crowding distance
title_full_unstemmed Biclustering of microarray data with MOSPO based on crowding distance
title_short Biclustering of microarray data with MOSPO based on crowding distance
title_sort biclustering of microarray data with mospo based on crowding distance
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2681067/
https://www.ncbi.nlm.nih.gov/pubmed/19426457
http://dx.doi.org/10.1186/1471-2105-10-S4-S9
work_keys_str_mv AT liujunwan biclusteringofmicroarraydatawithmospobasedoncrowdingdistance
AT lizhoujun biclusteringofmicroarraydatawithmospobasedoncrowdingdistance
AT huxiaohua biclusteringofmicroarraydatawithmospobasedoncrowdingdistance
AT chenyiming biclusteringofmicroarraydatawithmospobasedoncrowdingdistance