Cargando…

Human-supervised clustering of multidimensional data using crowdsourcing

Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances beco...

Descripción completa

Detalles Bibliográficos
Autores principales: Butyaev, Alexander, Drogaris, Chrisostomos, Tremblay-Savard, Olivier, Waldispühl, Jérôme
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128850/
https://www.ncbi.nlm.nih.gov/pubmed/35620007
http://dx.doi.org/10.1098/rsos.211189
_version_ 1784712630256009216
author Butyaev, Alexander
Drogaris, Chrisostomos
Tremblay-Savard, Olivier
Waldispühl, Jérôme
author_facet Butyaev, Alexander
Drogaris, Chrisostomos
Tremblay-Savard, Olivier
Waldispühl, Jérôme
author_sort Butyaev, Alexander
collection PubMed
description Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
format Online
Article
Text
id pubmed-9128850
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-91288502022-05-25 Human-supervised clustering of multidimensional data using crowdsourcing Butyaev, Alexander Drogaris, Chrisostomos Tremblay-Savard, Olivier Waldispühl, Jérôme R Soc Open Sci Computer Science and Artificial Intelligence Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems. The Royal Society 2022-05-24 /pmc/articles/PMC9128850/ /pubmed/35620007 http://dx.doi.org/10.1098/rsos.211189 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited.
spellingShingle Computer Science and Artificial Intelligence
Butyaev, Alexander
Drogaris, Chrisostomos
Tremblay-Savard, Olivier
Waldispühl, Jérôme
Human-supervised clustering of multidimensional data using crowdsourcing
title Human-supervised clustering of multidimensional data using crowdsourcing
title_full Human-supervised clustering of multidimensional data using crowdsourcing
title_fullStr Human-supervised clustering of multidimensional data using crowdsourcing
title_full_unstemmed Human-supervised clustering of multidimensional data using crowdsourcing
title_short Human-supervised clustering of multidimensional data using crowdsourcing
title_sort human-supervised clustering of multidimensional data using crowdsourcing
topic Computer Science and Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128850/
https://www.ncbi.nlm.nih.gov/pubmed/35620007
http://dx.doi.org/10.1098/rsos.211189
work_keys_str_mv AT butyaevalexander humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing
AT drogarischrisostomos humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing
AT tremblaysavardolivier humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing
AT waldispuhljerome humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing