Cargando…

Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?

Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suit...

Descripción completa

Detalles Bibliográficos
Autores principales: Toffalini, Enrico, Girardi, Paolo, Giofrè, David, Altoè, Gianmarco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9246139/
https://www.ncbi.nlm.nih.gov/pubmed/35771764
http://dx.doi.org/10.1371/journal.pone.0269584
_version_ 1784738902315106304
author Toffalini, Enrico
Girardi, Paolo
Giofrè, David
Altoè, Gianmarco
author_facet Toffalini, Enrico
Girardi, Paolo
Giofrè, David
Altoè, Gianmarco
author_sort Toffalini, Enrico
collection PubMed
description Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals.
format Online
Article
Text
id pubmed-9246139
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92461392022-07-01 Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? Toffalini, Enrico Girardi, Paolo Giofrè, David Altoè, Gianmarco PLoS One Research Article Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals. Public Library of Science 2022-06-30 /pmc/articles/PMC9246139/ /pubmed/35771764 http://dx.doi.org/10.1371/journal.pone.0269584 Text en © 2022 Toffalini et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Toffalini, Enrico
Girardi, Paolo
Giofrè, David
Altoè, Gianmarco
Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title_full Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title_fullStr Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title_full_unstemmed Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title_short Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
title_sort entia non sunt multiplicanda … shall i look for clusters in my cognitive data?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9246139/
https://www.ncbi.nlm.nih.gov/pubmed/35771764
http://dx.doi.org/10.1371/journal.pone.0269584
work_keys_str_mv AT toffalinienrico entianonsuntmultiplicandashallilookforclustersinmycognitivedata
AT girardipaolo entianonsuntmultiplicandashallilookforclustersinmycognitivedata
AT giofredavid entianonsuntmultiplicandashallilookforclustersinmycognitivedata
AT altoegianmarco entianonsuntmultiplicandashallilookforclustersinmycognitivedata