Cargando…

A robust approach based on Weibull distribution for clustering gene expression data

BACKGROUND: Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Huakun, Wang, Zhenzhen, Li, Xia, Gong, Binsheng, Feng, Lixin, Zhou, Ying
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3118357/ https://www.ncbi.nlm.nih.gov/pubmed/21624141 http://dx.doi.org/10.1186/1748-7188-6-14

_version_	1782206464211812352
author	Wang, Huakun Wang, Zhenzhen Li, Xia Gong, Binsheng Feng, Lixin Zhou, Ying
author_facet	Wang, Huakun Wang, Zhenzhen Li, Xia Gong, Binsheng Feng, Lixin Zhou, Ying
author_sort	Wang, Huakun
collection	PubMed
description	BACKGROUND: Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. RESULTS: In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. CONCLUSIONS: The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.
format	Online Article Text
id	pubmed-3118357
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31183572011-06-20 A robust approach based on Weibull distribution for clustering gene expression data Wang, Huakun Wang, Zhenzhen Li, Xia Gong, Binsheng Feng, Lixin Zhou, Ying Algorithms Mol Biol Research BACKGROUND: Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. RESULTS: In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. CONCLUSIONS: The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values. BioMed Central 2011-05-31 /pmc/articles/PMC3118357/ /pubmed/21624141 http://dx.doi.org/10.1186/1748-7188-6-14 Text en Copyright ©2011 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Wang, Huakun Wang, Zhenzhen Li, Xia Gong, Binsheng Feng, Lixin Zhou, Ying A robust approach based on Weibull distribution for clustering gene expression data
title	A robust approach based on Weibull distribution for clustering gene expression data
title_full	A robust approach based on Weibull distribution for clustering gene expression data
title_fullStr	A robust approach based on Weibull distribution for clustering gene expression data
title_full_unstemmed	A robust approach based on Weibull distribution for clustering gene expression data
title_short	A robust approach based on Weibull distribution for clustering gene expression data
title_sort	robust approach based on weibull distribution for clustering gene expression data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3118357/ https://www.ncbi.nlm.nih.gov/pubmed/21624141 http://dx.doi.org/10.1186/1748-7188-6-14
work_keys_str_mv	AT wanghuakun arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT wangzhenzhen arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT lixia arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT gongbinsheng arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT fenglixin arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT zhouying arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT wanghuakun robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT wangzhenzhen robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT lixia robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT gongbinsheng robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT fenglixin robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata AT zhouying robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata

A robust approach based on Weibull distribution for clustering gene expression data

Ejemplares similares