Cargando…

Bayesian Distance Clustering

Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data...

Descripción completa

Detalles Bibliográficos
Autores principales: Duan, Leo L, Dunson, David B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245927/
https://www.ncbi.nlm.nih.gov/pubmed/35782785
_version_ 1784738857533571072
author Duan, Leo L
Dunson, David B
author_facet Duan, Leo L
Dunson, David B
author_sort Duan, Leo L
collection PubMed
description Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data.
format Online
Article
Text
id pubmed-9245927
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-92459272022-06-30 Bayesian Distance Clustering Duan, Leo L Dunson, David B J Mach Learn Res Article Model-based clustering is widely used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. 2021 /pmc/articles/PMC9245927/ /pubmed/35782785 Text en https://creativecommons.org/licenses/by/4.0/License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-688.html. (http://jmlr.org/papers/v22/20-688.html)
spellingShingle Article
Duan, Leo L
Dunson, David B
Bayesian Distance Clustering
title Bayesian Distance Clustering
title_full Bayesian Distance Clustering
title_fullStr Bayesian Distance Clustering
title_full_unstemmed Bayesian Distance Clustering
title_short Bayesian Distance Clustering
title_sort bayesian distance clustering
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245927/
https://www.ncbi.nlm.nih.gov/pubmed/35782785
work_keys_str_mv AT duanleol bayesiandistanceclustering
AT dunsondavidb bayesiandistanceclustering