Cargando…
A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data
The problems of data abnormalities and missing data are puzzling the traditional multi-modal heterogeneous big data clustering. In order to solve this issue, a multi-view heterogeneous big data clustering algorithm based on improved Kmeans clustering is established in this paper. At first, for the b...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8236595/ https://www.ncbi.nlm.nih.gov/pubmed/34194310 http://dx.doi.org/10.3389/fnbot.2021.680613 |
_version_ | 1783714571437998080 |
---|---|
author | Yan, An Wang, Wei Ren, Yi Geng, HongWei |
author_facet | Yan, An Wang, Wei Ren, Yi Geng, HongWei |
author_sort | Yan, An |
collection | PubMed |
description | The problems of data abnormalities and missing data are puzzling the traditional multi-modal heterogeneous big data clustering. In order to solve this issue, a multi-view heterogeneous big data clustering algorithm based on improved Kmeans clustering is established in this paper. At first, for the big data which involve heterogeneous data, based on multi view data analyzing, we propose an advanced Kmeans algorithm on the base of multi view heterogeneous system to determine the similarity detection metrics. Then, a BP neural network method is used to predict the missing attribute values, complete the missing data and restore the big data structure in heterogeneous state. Last, we ulteriorly propose a data denoising algorithm to denoise the abnormal data. Based on the above methods, we construct a framework namely BPK-means to resolve the problems of data abnormalities and missing data. Our solution approach is evaluated through rigorous performance evaluation study. Compared with the original algorithm, both theoretical verification and experimental results show that the accuracy of the proposed method is greatly improved. |
format | Online Article Text |
id | pubmed-8236595 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82365952021-06-29 A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data Yan, An Wang, Wei Ren, Yi Geng, HongWei Front Neurorobot Neuroscience The problems of data abnormalities and missing data are puzzling the traditional multi-modal heterogeneous big data clustering. In order to solve this issue, a multi-view heterogeneous big data clustering algorithm based on improved Kmeans clustering is established in this paper. At first, for the big data which involve heterogeneous data, based on multi view data analyzing, we propose an advanced Kmeans algorithm on the base of multi view heterogeneous system to determine the similarity detection metrics. Then, a BP neural network method is used to predict the missing attribute values, complete the missing data and restore the big data structure in heterogeneous state. Last, we ulteriorly propose a data denoising algorithm to denoise the abnormal data. Based on the above methods, we construct a framework namely BPK-means to resolve the problems of data abnormalities and missing data. Our solution approach is evaluated through rigorous performance evaluation study. Compared with the original algorithm, both theoretical verification and experimental results show that the accuracy of the proposed method is greatly improved. Frontiers Media S.A. 2021-06-14 /pmc/articles/PMC8236595/ /pubmed/34194310 http://dx.doi.org/10.3389/fnbot.2021.680613 Text en Copyright © 2021 Yan, Wang, Ren and Geng. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Yan, An Wang, Wei Ren, Yi Geng, HongWei A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title | A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title_full | A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title_fullStr | A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title_full_unstemmed | A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title_short | A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data |
title_sort | clustering algorithm for multi-modal heterogeneous big data with abnormal data |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8236595/ https://www.ncbi.nlm.nih.gov/pubmed/34194310 http://dx.doi.org/10.3389/fnbot.2021.680613 |
work_keys_str_mv | AT yanan aclusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT wangwei aclusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT renyi aclusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT genghongwei aclusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT yanan clusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT wangwei clusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT renyi clusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata AT genghongwei clusteringalgorithmformultimodalheterogeneousbigdatawithabnormaldata |