Cargando…

Massive data clustering by multi-scale psychological observations

Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor co...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Shusen, Zhang, Liwen, Xu, Chen, Yu, Hanqiao, Fan, Jianqing, Xu, Zongben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889001/
https://www.ncbi.nlm.nih.gov/pubmed/35242339
http://dx.doi.org/10.1093/nsr/nwab183
_version_ 1784661292969099264
author Yang, Shusen
Zhang, Liwen
Xu, Chen
Yu, Hanqiao
Fan, Jianqing
Xu, Zongben
author_facet Yang, Shusen
Zhang, Liwen
Xu, Chen
Yu, Hanqiao
Fan, Jianqing
Xu, Zongben
author_sort Yang, Shusen
collection PubMed
description Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber–Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.
format Online
Article
Text
id pubmed-8889001
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88890012022-03-02 Massive data clustering by multi-scale psychological observations Yang, Shusen Zhang, Liwen Xu, Chen Yu, Hanqiao Fan, Jianqing Xu, Zongben Natl Sci Rev Research Article Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber–Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains. Oxford University Press 2021-10-08 /pmc/articles/PMC8889001/ /pubmed/35242339 http://dx.doi.org/10.1093/nsr/nwab183 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yang, Shusen
Zhang, Liwen
Xu, Chen
Yu, Hanqiao
Fan, Jianqing
Xu, Zongben
Massive data clustering by multi-scale psychological observations
title Massive data clustering by multi-scale psychological observations
title_full Massive data clustering by multi-scale psychological observations
title_fullStr Massive data clustering by multi-scale psychological observations
title_full_unstemmed Massive data clustering by multi-scale psychological observations
title_short Massive data clustering by multi-scale psychological observations
title_sort massive data clustering by multi-scale psychological observations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889001/
https://www.ncbi.nlm.nih.gov/pubmed/35242339
http://dx.doi.org/10.1093/nsr/nwab183
work_keys_str_mv AT yangshusen massivedataclusteringbymultiscalepsychologicalobservations
AT zhangliwen massivedataclusteringbymultiscalepsychologicalobservations
AT xuchen massivedataclusteringbymultiscalepsychologicalobservations
AT yuhanqiao massivedataclusteringbymultiscalepsychologicalobservations
AT fanjianqing massivedataclusteringbymultiscalepsychologicalobservations
AT xuzongben massivedataclusteringbymultiscalepsychologicalobservations