Cargando…
Massive data clustering by multi-scale psychological observations
Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor co...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889001/ https://www.ncbi.nlm.nih.gov/pubmed/35242339 http://dx.doi.org/10.1093/nsr/nwab183 |
_version_ | 1784661292969099264 |
---|---|
author | Yang, Shusen Zhang, Liwen Xu, Chen Yu, Hanqiao Fan, Jianqing Xu, Zongben |
author_facet | Yang, Shusen Zhang, Liwen Xu, Chen Yu, Hanqiao Fan, Jianqing Xu, Zongben |
author_sort | Yang, Shusen |
collection | PubMed |
description | Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber–Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains. |
format | Online Article Text |
id | pubmed-8889001 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-88890012022-03-02 Massive data clustering by multi-scale psychological observations Yang, Shusen Zhang, Liwen Xu, Chen Yu, Hanqiao Fan, Jianqing Xu, Zongben Natl Sci Rev Research Article Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber–Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains. Oxford University Press 2021-10-08 /pmc/articles/PMC8889001/ /pubmed/35242339 http://dx.doi.org/10.1093/nsr/nwab183 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yang, Shusen Zhang, Liwen Xu, Chen Yu, Hanqiao Fan, Jianqing Xu, Zongben Massive data clustering by multi-scale psychological observations |
title | Massive data clustering by multi-scale psychological observations |
title_full | Massive data clustering by multi-scale psychological observations |
title_fullStr | Massive data clustering by multi-scale psychological observations |
title_full_unstemmed | Massive data clustering by multi-scale psychological observations |
title_short | Massive data clustering by multi-scale psychological observations |
title_sort | massive data clustering by multi-scale psychological observations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889001/ https://www.ncbi.nlm.nih.gov/pubmed/35242339 http://dx.doi.org/10.1093/nsr/nwab183 |
work_keys_str_mv | AT yangshusen massivedataclusteringbymultiscalepsychologicalobservations AT zhangliwen massivedataclusteringbymultiscalepsychologicalobservations AT xuchen massivedataclusteringbymultiscalepsychologicalobservations AT yuhanqiao massivedataclusteringbymultiscalepsychologicalobservations AT fanjianqing massivedataclusteringbymultiscalepsychologicalobservations AT xuzongben massivedataclusteringbymultiscalepsychologicalobservations |