Cargando…

A trainable clustering algorithm based on shortest paths from density peaks

Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate...

Descripción completa

Detalles Bibliográficos
Autores principales: Pizzagalli, Diego Ulisse, Gonzalez, Santiago Fernandez, Krause, Rolf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7051829/
https://www.ncbi.nlm.nih.gov/pubmed/32195334
http://dx.doi.org/10.1126/sciadv.aax3770
_version_ 1783502745002573824
author Pizzagalli, Diego Ulisse
Gonzalez, Santiago Fernandez
Krause, Rolf
author_facet Pizzagalli, Diego Ulisse
Gonzalez, Santiago Fernandez
Krause, Rolf
author_sort Pizzagalli, Diego Ulisse
collection PubMed
description Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate metrics and rules can lead to artifacts, especially in case of multiple groups with heterogeneous structure. In this work, we propose a clustering algorithm that evaluates the properties of paths between points (rather than point-to-point similarity) and solves a global optimization problem, finding solutions not obtainable by methods relying on local choices. Moreover, our algorithm is trainable. Hence, it can be adapted and adopted for specific datasets and applications by providing examples of valid and invalid paths to train a path classifier. We demonstrate its applicability to identify heterogeneous groups in challenging synthetic datasets, segment highly nonconvex immune cells in confocal microscopy images, and classify arrhythmic heartbeats in electrocardiographic signals.
format Online
Article
Text
id pubmed-7051829
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-70518292020-03-19 A trainable clustering algorithm based on shortest paths from density peaks Pizzagalli, Diego Ulisse Gonzalez, Santiago Fernandez Krause, Rolf Sci Adv Research Articles Clustering is a technique to analyze empirical data, with a major application for biomedical research. Essentially, clustering finds groups of related points in a dataset. However, results depend on both metrics for point-to-point similarity and rules for point-to-group association. Non-appropriate metrics and rules can lead to artifacts, especially in case of multiple groups with heterogeneous structure. In this work, we propose a clustering algorithm that evaluates the properties of paths between points (rather than point-to-point similarity) and solves a global optimization problem, finding solutions not obtainable by methods relying on local choices. Moreover, our algorithm is trainable. Hence, it can be adapted and adopted for specific datasets and applications by providing examples of valid and invalid paths to train a path classifier. We demonstrate its applicability to identify heterogeneous groups in challenging synthetic datasets, segment highly nonconvex immune cells in confocal microscopy images, and classify arrhythmic heartbeats in electrocardiographic signals. American Association for the Advancement of Science 2019-10-30 /pmc/articles/PMC7051829/ /pubmed/32195334 http://dx.doi.org/10.1126/sciadv.aax3770 Text en Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY). http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Pizzagalli, Diego Ulisse
Gonzalez, Santiago Fernandez
Krause, Rolf
A trainable clustering algorithm based on shortest paths from density peaks
title A trainable clustering algorithm based on shortest paths from density peaks
title_full A trainable clustering algorithm based on shortest paths from density peaks
title_fullStr A trainable clustering algorithm based on shortest paths from density peaks
title_full_unstemmed A trainable clustering algorithm based on shortest paths from density peaks
title_short A trainable clustering algorithm based on shortest paths from density peaks
title_sort trainable clustering algorithm based on shortest paths from density peaks
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7051829/
https://www.ncbi.nlm.nih.gov/pubmed/32195334
http://dx.doi.org/10.1126/sciadv.aax3770
work_keys_str_mv AT pizzagallidiegoulisse atrainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks
AT gonzalezsantiagofernandez atrainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks
AT krauserolf atrainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks
AT pizzagallidiegoulisse trainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks
AT gonzalezsantiagofernandez trainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks
AT krauserolf trainableclusteringalgorithmbasedonshortestpathsfromdensitypeaks