Cargando…
Robust clustering in high dimensional data using statistical depths
BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional da...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099500/ https://www.ncbi.nlm.nih.gov/pubmed/18047731 http://dx.doi.org/10.1186/1471-2105-8-S7-S8 |
_version_ | 1782138320654958592 |
---|---|
author | Ding, Yuanyuan Dang, Xin Peng, Hanxiang Wilkins, Dawn |
author_facet | Ding, Yuanyuan Dang, Xin Peng, Hanxiang Wilkins, Dawn |
author_sort | Ding, Yuanyuan |
collection | PubMed |
description | BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data. RESULTS: Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm. CONCLUSION: Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering. |
format | Text |
id | pubmed-2099500 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-20995002007-12-03 Robust clustering in high dimensional data using statistical depths Ding, Yuanyuan Dang, Xin Peng, Hanxiang Wilkins, Dawn BMC Bioinformatics Proceedings BACKGROUND: Mean-based clustering algorithms such as bisecting k-means generally lack robustness. Although componentwise median is a more robust alternative, it can be a poor center representative for high dimensional data. We need a new algorithm that is robust and works well in high dimensional data sets e.g. gene expression data. RESULTS: Here we propose a new robust divisive clustering algorithm, the bisecting k-spatialMedian, based on the statistical spatial depth. A new subcluster selection rule, Relative Average Depth, is also introduced. We demonstrate that the proposed clustering algorithm outperforms the componentwise-median-based bisecting k-median algorithm for high dimension and low sample size (HDLSS) data via applications of the algorithms on two real HDLSS gene expression data sets. When further applied on noisy real data sets, the proposed algorithm compares favorably in terms of robustness with the componentwise-median-based bisecting k-median algorithm. CONCLUSION: Statistical data depths provide an alternative way to find the "center" of multivariate data sets and are useful and robust for clustering. BioMed Central 2007-11-01 /pmc/articles/PMC2099500/ /pubmed/18047731 http://dx.doi.org/10.1186/1471-2105-8-S7-S8 Text en Copyright © 2007 Ding et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Ding, Yuanyuan Dang, Xin Peng, Hanxiang Wilkins, Dawn Robust clustering in high dimensional data using statistical depths |
title | Robust clustering in high dimensional data using statistical depths |
title_full | Robust clustering in high dimensional data using statistical depths |
title_fullStr | Robust clustering in high dimensional data using statistical depths |
title_full_unstemmed | Robust clustering in high dimensional data using statistical depths |
title_short | Robust clustering in high dimensional data using statistical depths |
title_sort | robust clustering in high dimensional data using statistical depths |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099500/ https://www.ncbi.nlm.nih.gov/pubmed/18047731 http://dx.doi.org/10.1186/1471-2105-8-S7-S8 |
work_keys_str_mv | AT dingyuanyuan robustclusteringinhighdimensionaldatausingstatisticaldepths AT dangxin robustclusteringinhighdimensionaldatausingstatisticaldepths AT penghanxiang robustclusteringinhighdimensionaldatausingstatisticaldepths AT wilkinsdawn robustclusteringinhighdimensionaldatausingstatisticaldepths |