Cargando…

Clustering Scatter Plots Using Data Depth Measures

Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhanpan, Cui, Xinping, Jeske, Daniel R, Li, Xiaoxiao, Braun, Jonathan, Borneman, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4038101/
https://www.ncbi.nlm.nih.gov/pubmed/24883225
http://dx.doi.org/10.4172/2155-6180.S5-001
_version_ 1782318327062855680
author Zhang, Zhanpan
Cui, Xinping
Jeske, Daniel R
Li, Xiaoxiao
Braun, Jonathan
Borneman, James
author_facet Zhang, Zhanpan
Cui, Xinping
Jeske, Daniel R
Li, Xiaoxiao
Braun, Jonathan
Borneman, James
author_sort Zhang, Zhanpan
collection PubMed
description Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on “data depth” to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study.
format Online
Article
Text
id pubmed-4038101
institution National Center for Biotechnology Information
language English
publishDate 2011
record_format MEDLINE/PubMed
spelling pubmed-40381012014-05-29 Clustering Scatter Plots Using Data Depth Measures Zhang, Zhanpan Cui, Xinping Jeske, Daniel R Li, Xiaoxiao Braun, Jonathan Borneman, James J Biom Biostat Article Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on “data depth” to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study. 2011-12-25 2011 /pmc/articles/PMC4038101/ /pubmed/24883225 http://dx.doi.org/10.4172/2155-6180.S5-001 Text en Copyright: © 2011 Zhang Z, et al. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Article
Zhang, Zhanpan
Cui, Xinping
Jeske, Daniel R
Li, Xiaoxiao
Braun, Jonathan
Borneman, James
Clustering Scatter Plots Using Data Depth Measures
title Clustering Scatter Plots Using Data Depth Measures
title_full Clustering Scatter Plots Using Data Depth Measures
title_fullStr Clustering Scatter Plots Using Data Depth Measures
title_full_unstemmed Clustering Scatter Plots Using Data Depth Measures
title_short Clustering Scatter Plots Using Data Depth Measures
title_sort clustering scatter plots using data depth measures
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4038101/
https://www.ncbi.nlm.nih.gov/pubmed/24883225
http://dx.doi.org/10.4172/2155-6180.S5-001
work_keys_str_mv AT zhangzhanpan clusteringscatterplotsusingdatadepthmeasures
AT cuixinping clusteringscatterplotsusingdatadepthmeasures
AT jeskedanielr clusteringscatterplotsusingdatadepthmeasures
AT lixiaoxiao clusteringscatterplotsusingdatadepthmeasures
AT braunjonathan clusteringscatterplotsusingdatadepthmeasures
AT bornemanjames clusteringscatterplotsusingdatadepthmeasures