Cargando…
DCG++: A data-driven metric for geometric pattern recognition
Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we pr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553753/ https://www.ncbi.nlm.nih.gov/pubmed/31170208 http://dx.doi.org/10.1371/journal.pone.0217838 |
_version_ | 1783424869932728320 |
---|---|
author | Guan, Jiahui Hsieh, Fushing Koehl, Patrice |
author_facet | Guan, Jiahui Hsieh, Fushing Koehl, Patrice |
author_sort | Guan, Jiahui |
collection | PubMed |
description | Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure. |
format | Online Article Text |
id | pubmed-6553753 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-65537532019-06-17 DCG++: A data-driven metric for geometric pattern recognition Guan, Jiahui Hsieh, Fushing Koehl, Patrice PLoS One Research Article Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure. Public Library of Science 2019-06-06 /pmc/articles/PMC6553753/ /pubmed/31170208 http://dx.doi.org/10.1371/journal.pone.0217838 Text en © 2019 Guan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Guan, Jiahui Hsieh, Fushing Koehl, Patrice DCG++: A data-driven metric for geometric pattern recognition |
title | DCG++: A data-driven metric for geometric pattern recognition |
title_full | DCG++: A data-driven metric for geometric pattern recognition |
title_fullStr | DCG++: A data-driven metric for geometric pattern recognition |
title_full_unstemmed | DCG++: A data-driven metric for geometric pattern recognition |
title_short | DCG++: A data-driven metric for geometric pattern recognition |
title_sort | dcg++: a data-driven metric for geometric pattern recognition |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553753/ https://www.ncbi.nlm.nih.gov/pubmed/31170208 http://dx.doi.org/10.1371/journal.pone.0217838 |
work_keys_str_mv | AT guanjiahui dcgadatadrivenmetricforgeometricpatternrecognition AT hsiehfushing dcgadatadrivenmetricforgeometricpatternrecognition AT koehlpatrice dcgadatadrivenmetricforgeometricpatternrecognition |