Cargando…

Clusterdv: a simple density-based clustering method that is robust, general and automatic

MOTIVATION: How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a singl...

Descripción completa

Detalles Bibliográficos
Autores principales: Marques, João C, Orger, Michael B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581440/
https://www.ncbi.nlm.nih.gov/pubmed/30407500
http://dx.doi.org/10.1093/bioinformatics/bty932
_version_ 1783428167363461120
author Marques, João C
Orger, Michael B
author_facet Marques, João C
Orger, Michael B
author_sort Marques, João C
collection PubMed
description MOTIVATION: How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. RESULTS: We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental datasets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data. AVAILABILITY AND IMPLEMENTATION: The clusterdv is implemented in Matlab. Its source code, together with example datasets are available on: https://github.com/jcbmarques/clusterdv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6581440
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65814402019-06-21 Clusterdv: a simple density-based clustering method that is robust, general and automatic Marques, João C Orger, Michael B Bioinformatics Original Papers MOTIVATION: How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. RESULTS: We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental datasets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data. AVAILABILITY AND IMPLEMENTATION: The clusterdv is implemented in Matlab. Its source code, together with example datasets are available on: https://github.com/jcbmarques/clusterdv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06 2018-11-08 /pmc/articles/PMC6581440/ /pubmed/30407500 http://dx.doi.org/10.1093/bioinformatics/bty932 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Marques, João C
Orger, Michael B
Clusterdv: a simple density-based clustering method that is robust, general and automatic
title Clusterdv: a simple density-based clustering method that is robust, general and automatic
title_full Clusterdv: a simple density-based clustering method that is robust, general and automatic
title_fullStr Clusterdv: a simple density-based clustering method that is robust, general and automatic
title_full_unstemmed Clusterdv: a simple density-based clustering method that is robust, general and automatic
title_short Clusterdv: a simple density-based clustering method that is robust, general and automatic
title_sort clusterdv: a simple density-based clustering method that is robust, general and automatic
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581440/
https://www.ncbi.nlm.nih.gov/pubmed/30407500
http://dx.doi.org/10.1093/bioinformatics/bty932
work_keys_str_mv AT marquesjoaoc clusterdvasimpledensitybasedclusteringmethodthatisrobustgeneralandautomatic
AT orgermichaelb clusterdvasimpledensitybasedclusteringmethodthatisrobustgeneralandautomatic