Cargando…

Research and Application of Clustering Algorithm for Text Big Data

In the era of big data, text as an information reserve database is very important, in all walks of life. From humanities research to government decision-making, from precision medicine to quantitative finance, from customer management to marketing, massive text, as one of the most important informat...

Descripción completa

Detalles Bibliográficos
Autor principal:	Chen, Zi Li
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200521/ https://www.ncbi.nlm.nih.gov/pubmed/35720917 http://dx.doi.org/10.1155/2022/7042778

_version_	1784728078842331136
author	Chen, Zi Li
author_facet	Chen, Zi Li
author_sort	Chen, Zi Li
collection	PubMed
description	In the era of big data, text as an information reserve database is very important, in all walks of life. From humanities research to government decision-making, from precision medicine to quantitative finance, from customer management to marketing, massive text, as one of the most important information carriers, plays an important role everywhere. The text data generated in these practical problems of humanities research, financial industry, marketing, and other fields often has obvious domain characteristics, often containing the professional vocabulary and unique language patterns in these fields and often accompanied by a variety of “noise.” Dealing with such texts is a great challenge for the current technical conditions, especially for Chinese texts. A clustering algorithm provides a better solution for text big data information processing. Clustering algorithm is the main body of cluster analysis, K-means algorithm with its implementation principle is simple, low time complexity is widely used in the field of cluster analysis, but its K value needs to be preset, initial clustering center random selection into local optimal solution, other clustering algorithm, such as mean drift clustering, K-means clustering in mining text big data. In view of the problems of the above algorithm, this paper first extracts and analyzes the text big data and then does experiments with the clustering algorithm. Experimental conclusion: by analyzing large-scale text data limited to large-scale and simple data set, the traditional K-means algorithm has low efficiency and reduced accuracy, and the K-means algorithm is susceptible to the influence of initial center and abnormal data. According to the above problems, the K-means cluster analysis algorithm for data sets with large data volumes is analyzed and improved to improve its execution efficiency and accuracy on data sets with large data volume set. Mean shift clustering can be regarded as making many random centers move towards the direction of maximum density gradually, that is, moving their mean centroid continuously according to the probability density of data and finally obtaining multiple maximum density centers. It can also be said that mean shift clustering is a kernel density estimation algorithm.
format	Online Article Text
id	pubmed-9200521
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-92005212022-06-16 Research and Application of Clustering Algorithm for Text Big Data Chen, Zi Li Comput Intell Neurosci Research Article In the era of big data, text as an information reserve database is very important, in all walks of life. From humanities research to government decision-making, from precision medicine to quantitative finance, from customer management to marketing, massive text, as one of the most important information carriers, plays an important role everywhere. The text data generated in these practical problems of humanities research, financial industry, marketing, and other fields often has obvious domain characteristics, often containing the professional vocabulary and unique language patterns in these fields and often accompanied by a variety of “noise.” Dealing with such texts is a great challenge for the current technical conditions, especially for Chinese texts. A clustering algorithm provides a better solution for text big data information processing. Clustering algorithm is the main body of cluster analysis, K-means algorithm with its implementation principle is simple, low time complexity is widely used in the field of cluster analysis, but its K value needs to be preset, initial clustering center random selection into local optimal solution, other clustering algorithm, such as mean drift clustering, K-means clustering in mining text big data. In view of the problems of the above algorithm, this paper first extracts and analyzes the text big data and then does experiments with the clustering algorithm. Experimental conclusion: by analyzing large-scale text data limited to large-scale and simple data set, the traditional K-means algorithm has low efficiency and reduced accuracy, and the K-means algorithm is susceptible to the influence of initial center and abnormal data. According to the above problems, the K-means cluster analysis algorithm for data sets with large data volumes is analyzed and improved to improve its execution efficiency and accuracy on data sets with large data volume set. Mean shift clustering can be regarded as making many random centers move towards the direction of maximum density gradually, that is, moving their mean centroid continuously according to the probability density of data and finally obtaining multiple maximum density centers. It can also be said that mean shift clustering is a kernel density estimation algorithm. Hindawi 2022-06-08 /pmc/articles/PMC9200521/ /pubmed/35720917 http://dx.doi.org/10.1155/2022/7042778 Text en Copyright © 2022 Zi Li Chen. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Chen, Zi Li Research and Application of Clustering Algorithm for Text Big Data
title	Research and Application of Clustering Algorithm for Text Big Data
title_full	Research and Application of Clustering Algorithm for Text Big Data
title_fullStr	Research and Application of Clustering Algorithm for Text Big Data
title_full_unstemmed	Research and Application of Clustering Algorithm for Text Big Data
title_short	Research and Application of Clustering Algorithm for Text Big Data
title_sort	research and application of clustering algorithm for text big data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200521/ https://www.ncbi.nlm.nih.gov/pubmed/35720917 http://dx.doi.org/10.1155/2022/7042778
work_keys_str_mv	AT chenzili researchandapplicationofclusteringalgorithmfortextbigdata

Research and Application of Clustering Algorithm for Text Big Data

Ejemplares similares