Cargando…

Robust clustering in high dimensional data using statistical depths

BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional da...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Yuanyuan, Dang, Xin, Peng, Hanxiang, Wilkins, Dawn
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099500/
https://www.ncbi.nlm.nih.gov/pubmed/18047731
http://dx.doi.org/10.1186/1471-2105-8-S7-S8
_version_ 1782138320654958592
author Ding, Yuanyuan
Dang, Xin
Peng, Hanxiang
Wilkins, Dawn
author_facet Ding, Yuanyuan
Dang, Xin
Peng, Hanxiang
Wilkins, Dawn
author_sort Ding, Yuanyuan
collection PubMed
description BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data. RESULTS: Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm. CONCLUSION: Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering.
format Text
id pubmed-2099500
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20995002007-12-03 Robust clustering in high dimensional data using statistical depths Ding, Yuanyuan Dang, Xin Peng, Hanxiang Wilkins, Dawn BMC Bioinformatics Proceedings BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data. RESULTS: Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm. CONCLUSION: Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering. BioMed Central 2007-11-01 /pmc/articles/PMC2099500/ /pubmed/18047731 http://dx.doi.org/10.1186/1471-2105-8-S7-S8 Text en Copyright © 2007 Ding et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Ding, Yuanyuan
Dang, Xin
Peng, Hanxiang
Wilkins, Dawn
Robust clustering in high dimensional data using statistical depths
title Robust clustering in high dimensional data using statistical depths
title_full Robust clustering in high dimensional data using statistical depths
title_fullStr Robust clustering in high dimensional data using statistical depths
title_full_unstemmed Robust clustering in high dimensional data using statistical depths
title_short Robust clustering in high dimensional data using statistical depths
title_sort robust clustering in high dimensional data using statistical depths
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099500/
https://www.ncbi.nlm.nih.gov/pubmed/18047731
http://dx.doi.org/10.1186/1471-2105-8-S7-S8
work_keys_str_mv AT dingyuanyuan robustclusteringinhighdimensionaldatausingstatisticaldepths
AT dangxin robustclusteringinhighdimensionaldatausingstatisticaldepths
AT penghanxiang robustclusteringinhighdimensionaldatausingstatisticaldepths
AT wilkinsdawn robustclusteringinhighdimensionaldatausingstatisticaldepths