Cargando…
Semi-supervised consensus clustering for gene expression data analysis
BACKGROUND: Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustn...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4036113/ https://www.ncbi.nlm.nih.gov/pubmed/24920961 http://dx.doi.org/10.1186/1756-0381-7-7 |
_version_ | 1782318136267112448 |
---|---|
author | Wang, Yunli Pan, Youlian |
author_facet | Wang, Yunli Pan, Youlian |
author_sort | Wang, Yunli |
collection | PubMed |
description | BACKGROUND: Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge. METHODS: We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation. RESULTS: Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper. |
format | Online Article Text |
id | pubmed-4036113 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40361132014-06-11 Semi-supervised consensus clustering for gene expression data analysis Wang, Yunli Pan, Youlian BioData Min Methodology BACKGROUND: Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge. METHODS: We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation. RESULTS: Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper. BioMed Central 2014-05-08 /pmc/articles/PMC4036113/ /pubmed/24920961 http://dx.doi.org/10.1186/1756-0381-7-7 Text en Copyright © 2014 Wang and Pan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Wang, Yunli Pan, Youlian Semi-supervised consensus clustering for gene expression data analysis |
title | Semi-supervised consensus clustering for gene expression data analysis |
title_full | Semi-supervised consensus clustering for gene expression data analysis |
title_fullStr | Semi-supervised consensus clustering for gene expression data analysis |
title_full_unstemmed | Semi-supervised consensus clustering for gene expression data analysis |
title_short | Semi-supervised consensus clustering for gene expression data analysis |
title_sort | semi-supervised consensus clustering for gene expression data analysis |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4036113/ https://www.ncbi.nlm.nih.gov/pubmed/24920961 http://dx.doi.org/10.1186/1756-0381-7-7 |
work_keys_str_mv | AT wangyunli semisupervisedconsensusclusteringforgeneexpressiondataanalysis AT panyoulian semisupervisedconsensusclusteringforgeneexpressiondataanalysis |