Cargando…
Human-supervised clustering of multidimensional data using crowdsourcing
Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances beco...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128850/ https://www.ncbi.nlm.nih.gov/pubmed/35620007 http://dx.doi.org/10.1098/rsos.211189 |
_version_ | 1784712630256009216 |
---|---|
author | Butyaev, Alexander Drogaris, Chrisostomos Tremblay-Savard, Olivier Waldispühl, Jérôme |
author_facet | Butyaev, Alexander Drogaris, Chrisostomos Tremblay-Savard, Olivier Waldispühl, Jérôme |
author_sort | Butyaev, Alexander |
collection | PubMed |
description | Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems. |
format | Online Article Text |
id | pubmed-9128850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | The Royal Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-91288502022-05-25 Human-supervised clustering of multidimensional data using crowdsourcing Butyaev, Alexander Drogaris, Chrisostomos Tremblay-Savard, Olivier Waldispühl, Jérôme R Soc Open Sci Computer Science and Artificial Intelligence Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems. The Royal Society 2022-05-24 /pmc/articles/PMC9128850/ /pubmed/35620007 http://dx.doi.org/10.1098/rsos.211189 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Computer Science and Artificial Intelligence Butyaev, Alexander Drogaris, Chrisostomos Tremblay-Savard, Olivier Waldispühl, Jérôme Human-supervised clustering of multidimensional data using crowdsourcing |
title | Human-supervised clustering of multidimensional data using crowdsourcing |
title_full | Human-supervised clustering of multidimensional data using crowdsourcing |
title_fullStr | Human-supervised clustering of multidimensional data using crowdsourcing |
title_full_unstemmed | Human-supervised clustering of multidimensional data using crowdsourcing |
title_short | Human-supervised clustering of multidimensional data using crowdsourcing |
title_sort | human-supervised clustering of multidimensional data using crowdsourcing |
topic | Computer Science and Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128850/ https://www.ncbi.nlm.nih.gov/pubmed/35620007 http://dx.doi.org/10.1098/rsos.211189 |
work_keys_str_mv | AT butyaevalexander humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing AT drogarischrisostomos humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing AT tremblaysavardolivier humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing AT waldispuhljerome humansupervisedclusteringofmultidimensionaldatausingcrowdsourcing |