Cargando…
Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data?
Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suit...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9246139/ https://www.ncbi.nlm.nih.gov/pubmed/35771764 http://dx.doi.org/10.1371/journal.pone.0269584 |
_version_ | 1784738902315106304 |
---|---|
author | Toffalini, Enrico Girardi, Paolo Giofrè, David Altoè, Gianmarco |
author_facet | Toffalini, Enrico Girardi, Paolo Giofrè, David Altoè, Gianmarco |
author_sort | Toffalini, Enrico |
collection | PubMed |
description | Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals. |
format | Online Article Text |
id | pubmed-9246139 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-92461392022-07-01 Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? Toffalini, Enrico Girardi, Paolo Giofrè, David Altoè, Gianmarco PLoS One Research Article Unsupervised clustering methods are increasingly being applied in psychology. Researchers may use such methods on multivariate data to reveal previously undetected sub-populations of individuals within a larger population. Realistic research scenarios in the cognitive science may not be ideally suited for a successful use of these methods, however, as they are characterized by modest effect sizes, limited sample sizes, and non-orthogonal indicators. This combination of characteristics even presents a high risk of detecting non-existing clusters. A systematic review showed that, among 191 studies published in 2016–2020 that used different clustering methods to classify human participants, the median sample size was only 322, and a median of 3 latent classes/clusters were detected. None of them concluded in favor of a one-cluster solution, potentially giving rise to an extreme publication bias. Dimensionality reduction techniques are almost never used before clustering. In a subsequent simulation study, we examined the performance of popular clustering techniques, including Gaussian mixture model, a partitioning, and a hierarchical agglomerative algorithm. We focused on their ability to detect the correct number of clusters, and on their classification accuracy. Under a reasoned set of scenarios that we considered plausible for the cognitive research, none of the methods adequately discriminates between one vs two true clusters. In addition, non-orthogonal indicators lead to a high risk of incorrectly detecting multiple clusters where none existed, even in the presence of only modest correlation (a frequent case in psychology). In conclusion, it is hard for researchers to be in a condition to achieve a valid unsupervised clustering for inferential purposes with a view to classifying individuals. Public Library of Science 2022-06-30 /pmc/articles/PMC9246139/ /pubmed/35771764 http://dx.doi.org/10.1371/journal.pone.0269584 Text en © 2022 Toffalini et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Toffalini, Enrico Girardi, Paolo Giofrè, David Altoè, Gianmarco Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title | Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title_full | Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title_fullStr | Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title_full_unstemmed | Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title_short | Entia Non Sunt Multiplicanda … Shall I look for clusters in my cognitive data? |
title_sort | entia non sunt multiplicanda … shall i look for clusters in my cognitive data? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9246139/ https://www.ncbi.nlm.nih.gov/pubmed/35771764 http://dx.doi.org/10.1371/journal.pone.0269584 |
work_keys_str_mv | AT toffalinienrico entianonsuntmultiplicandashallilookforclustersinmycognitivedata AT girardipaolo entianonsuntmultiplicandashallilookforclustersinmycognitivedata AT giofredavid entianonsuntmultiplicandashallilookforclustersinmycognitivedata AT altoegianmarco entianonsuntmultiplicandashallilookforclustersinmycognitivedata |