Cargando…

DCG++: A data-driven metric for geometric pattern recognition

Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Jiahui, Hsieh, Fushing, Koehl, Patrice
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553753/
https://www.ncbi.nlm.nih.gov/pubmed/31170208
http://dx.doi.org/10.1371/journal.pone.0217838
_version_ 1783424869932728320
author Guan, Jiahui
Hsieh, Fushing
Koehl, Patrice
author_facet Guan, Jiahui
Hsieh, Fushing
Koehl, Patrice
author_sort Guan, Jiahui
collection PubMed
description Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure.
format Online
Article
Text
id pubmed-6553753
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65537532019-06-17 DCG++: A data-driven metric for geometric pattern recognition Guan, Jiahui Hsieh, Fushing Koehl, Patrice PLoS One Research Article Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure. Public Library of Science 2019-06-06 /pmc/articles/PMC6553753/ /pubmed/31170208 http://dx.doi.org/10.1371/journal.pone.0217838 Text en © 2019 Guan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Guan, Jiahui
Hsieh, Fushing
Koehl, Patrice
DCG++: A data-driven metric for geometric pattern recognition
title DCG++: A data-driven metric for geometric pattern recognition
title_full DCG++: A data-driven metric for geometric pattern recognition
title_fullStr DCG++: A data-driven metric for geometric pattern recognition
title_full_unstemmed DCG++: A data-driven metric for geometric pattern recognition
title_short DCG++: A data-driven metric for geometric pattern recognition
title_sort dcg++: a data-driven metric for geometric pattern recognition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6553753/
https://www.ncbi.nlm.nih.gov/pubmed/31170208
http://dx.doi.org/10.1371/journal.pone.0217838
work_keys_str_mv AT guanjiahui dcgadatadrivenmetricforgeometricpatternrecognition
AT hsiehfushing dcgadatadrivenmetricforgeometricpatternrecognition
AT koehlpatrice dcgadatadrivenmetricforgeometricpatternrecognition