Cargando…

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle larg...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ling, Huidong, Zhu, Xinmu, Zhu, Tao, Nie, Mingxing, Liu, Zhenghai, Liu, Zhenyu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955697/ https://www.ncbi.nlm.nih.gov/pubmed/36832627 http://dx.doi.org/10.3390/e25020259

_version_	1784894410479108096
author	Ling, Huidong Zhu, Xinmu Zhu, Tao Nie, Mingxing Liu, Zhenghai Liu, Zhenyu
author_facet	Ling, Huidong Zhu, Xinmu Zhu, Tao Nie, Mingxing Liu, Zhenghai Liu, Zhenyu
author_sort	Ling, Huidong
collection	PubMed
description	Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm’s running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster.
format	Online Article Text
id	pubmed-9955697
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99556972023-02-25 A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark Ling, Huidong Zhu, Xinmu Zhu, Tao Nie, Mingxing Liu, Zhenghai Liu, Zhenyu Entropy (Basel) Article Multiobjective clustering algorithm using particle swarm optimization has been applied successfully in some applications. However, existing algorithms are implemented on a single machine and cannot be directly parallelized on a cluster, which makes it difficult for existing algorithms to handle large-scale data. With the development of distributed parallel computing framework, data parallelism was proposed. However, the increase in parallelism will lead to the problem of unbalanced data distribution affecting the clustering effect. In this paper, we propose a parallel multiobjective PSO weighted average clustering algorithm based on apache Spark (Spark-MOPSO-Avg). First, the entire data set is divided into multiple partitions and cached in memory using the distributed parallel and memory-based computing of Apache Spark. The local fitness value of the particle is calculated in parallel according to the data in the partition. After the calculation is completed, only particle information is transmitted, and there is no need to transmit a large number of data objects between each node, reducing the communication of data in the network and thus effectively reducing the algorithm’s running time. Second, a weighted average calculation of the local fitness values is performed to improve the problem of unbalanced data distribution affecting the results. Experimental results show that the Spark-MOPSO-Avg algorithm achieves lower information loss under data parallelism, losing about 1% to 9% accuracy, but can effectively reduce the algorithm time overhead. It shows good execution efficiency and parallel computing capability under the Spark distributed cluster. MDPI 2023-01-31 /pmc/articles/PMC9955697/ /pubmed/36832627 http://dx.doi.org/10.3390/e25020259 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ling, Huidong Zhu, Xinmu Zhu, Tao Nie, Mingxing Liu, Zhenghai Liu, Zhenyu A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_full	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_fullStr	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_full_unstemmed	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_short	A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark
title_sort	parallel multiobjective pso weighted average clustering algorithm based on apache spark
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955697/ https://www.ncbi.nlm.nih.gov/pubmed/36832627 http://dx.doi.org/10.3390/e25020259
work_keys_str_mv	AT linghuidong aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhuxinmu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhutao aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT niemingxing aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT liuzhenghai aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT liuzhenyu aparallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT linghuidong parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhuxinmu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT zhutao parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT niemingxing parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT liuzhenghai parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark AT liuzhenyu parallelmultiobjectivepsoweightedaverageclusteringalgorithmbasedonapachespark

A Parallel Multiobjective PSO Weighted Average Clustering Algorithm Based on Apache Spark

Ejemplares similares