Cargando…

New resampling method for evaluating stability of clusters

BACKGROUND: Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for as...

Descripción completa

Detalles Bibliográficos
Autores principales: Gana Dresen, Irina M, Boes, Tanja, Huesing, Johannes, Neuhaeuser, Markus, Joeckel, Karl-Heinz
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265265/
https://www.ncbi.nlm.nih.gov/pubmed/18218074
http://dx.doi.org/10.1186/1471-2105-9-42
_version_ 1782151459594305536
author Gana Dresen, Irina M
Boes, Tanja
Huesing, Johannes
Neuhaeuser, Markus
Joeckel, Karl-Heinz
author_facet Gana Dresen, Irina M
Boes, Tanja
Huesing, Johannes
Neuhaeuser, Markus
Joeckel, Karl-Heinz
author_sort Gana Dresen, Irina M
collection PubMed
description BACKGROUND: Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. RESULTS: Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. CONCLUSION: We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.
format Text
id pubmed-2265265
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22652652008-03-07 New resampling method for evaluating stability of clusters Gana Dresen, Irina M Boes, Tanja Huesing, Johannes Neuhaeuser, Markus Joeckel, Karl-Heinz BMC Bioinformatics Methodology Article BACKGROUND: Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. RESULTS: Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. CONCLUSION: We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it. BioMed Central 2008-01-24 /pmc/articles/PMC2265265/ /pubmed/18218074 http://dx.doi.org/10.1186/1471-2105-9-42 Text en Copyright © 2008 Gana Dresen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Gana Dresen, Irina M
Boes, Tanja
Huesing, Johannes
Neuhaeuser, Markus
Joeckel, Karl-Heinz
New resampling method for evaluating stability of clusters
title New resampling method for evaluating stability of clusters
title_full New resampling method for evaluating stability of clusters
title_fullStr New resampling method for evaluating stability of clusters
title_full_unstemmed New resampling method for evaluating stability of clusters
title_short New resampling method for evaluating stability of clusters
title_sort new resampling method for evaluating stability of clusters
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265265/
https://www.ncbi.nlm.nih.gov/pubmed/18218074
http://dx.doi.org/10.1186/1471-2105-9-42
work_keys_str_mv AT ganadresenirinam newresamplingmethodforevaluatingstabilityofclusters
AT boestanja newresamplingmethodforevaluatingstabilityofclusters
AT huesingjohannes newresamplingmethodforevaluatingstabilityofclusters
AT neuhaeusermarkus newresamplingmethodforevaluatingstabilityofclusters
AT joeckelkarlheinz newresamplingmethodforevaluatingstabilityofclusters