Cargando…

Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks

Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as search optimization and extracting hidden information from data generated by IoT sensors. Swarm intelligence (SI) algorithms us...

Descripción completa

Detalles Bibliográficos
Autores principales: Selvaraj, Suganya, Choi, Eunmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8125674/
https://www.ncbi.nlm.nih.gov/pubmed/34064491
http://dx.doi.org/10.3390/s21093196
_version_ 1783693572829085696
author Selvaraj, Suganya
Choi, Eunmi
author_facet Selvaraj, Suganya
Choi, Eunmi
author_sort Selvaraj, Suganya
collection PubMed
description Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as search optimization and extracting hidden information from data generated by IoT sensors. Swarm intelligence (SI) algorithms use stochastic and heuristic principles that include simple and unintelligent individuals that follow some simple rules to accomplish very complex tasks. By mapping features of problems to parameters of SI algorithms, SI algorithms can achieve solutions in a flexible, robust, decentralized, and self-organized manner. Compared to traditional clustering algorithms, these solving mechanisms make swarm algorithms suitable for resolving complex document clustering problems. However, each SI algorithm shows a different performance based on its own strengths and weaknesses. In this paper, to find the best performing SI algorithm in text document clustering, we performed a comparative study for the PSO, bat, grey wolf optimization (GWO), and K-means algorithms using six data sets of various sizes, which were created from BBC Sport news and 20 newsgroups. Based on our experimental results, we discuss the features of a document clustering problem with the nature of SI algorithms and conclude that the PSO and GWO SI algorithms are better than K-means, and among those algorithms, the PSO performs best in terms of finding the optimal solution.
format Online
Article
Text
id pubmed-8125674
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81256742021-05-17 Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks Selvaraj, Suganya Choi, Eunmi Sensors (Basel) Article Text document clustering refers to the unsupervised classification of textual documents into clusters based on content similarity and can be applied in applications such as search optimization and extracting hidden information from data generated by IoT sensors. Swarm intelligence (SI) algorithms use stochastic and heuristic principles that include simple and unintelligent individuals that follow some simple rules to accomplish very complex tasks. By mapping features of problems to parameters of SI algorithms, SI algorithms can achieve solutions in a flexible, robust, decentralized, and self-organized manner. Compared to traditional clustering algorithms, these solving mechanisms make swarm algorithms suitable for resolving complex document clustering problems. However, each SI algorithm shows a different performance based on its own strengths and weaknesses. In this paper, to find the best performing SI algorithm in text document clustering, we performed a comparative study for the PSO, bat, grey wolf optimization (GWO), and K-means algorithms using six data sets of various sizes, which were created from BBC Sport news and 20 newsgroups. Based on our experimental results, we discuss the features of a document clustering problem with the nature of SI algorithms and conclude that the PSO and GWO SI algorithms are better than K-means, and among those algorithms, the PSO performs best in terms of finding the optimal solution. MDPI 2021-05-04 /pmc/articles/PMC8125674/ /pubmed/34064491 http://dx.doi.org/10.3390/s21093196 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Selvaraj, Suganya
Choi, Eunmi
Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title_full Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title_fullStr Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title_full_unstemmed Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title_short Swarm Intelligence Algorithms in Text Document Clustering with Various Benchmarks
title_sort swarm intelligence algorithms in text document clustering with various benchmarks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8125674/
https://www.ncbi.nlm.nih.gov/pubmed/34064491
http://dx.doi.org/10.3390/s21093196
work_keys_str_mv AT selvarajsuganya swarmintelligencealgorithmsintextdocumentclusteringwithvariousbenchmarks
AT choieunmi swarmintelligencealgorithmsintextdocumentclusteringwithvariousbenchmarks