Cargando…

Text embedding techniques for efficient clustering of twitter data

World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detectio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ravi, Jayasree, Kulkarni, Sushil
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2023
Materias:	Special Issue
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904526/ https://www.ncbi.nlm.nih.gov/pubmed/36777033 http://dx.doi.org/10.1007/s12065-023-00825-3

_version_	1784883634162892800
author	Ravi, Jayasree Kulkarni, Sushil
author_facet	Ravi, Jayasree Kulkarni, Sushil
author_sort	Ravi, Jayasree
collection	PubMed
description	World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering.
format	Online Article Text
id	pubmed-9904526
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Berlin Heidelberg
record_format	MEDLINE/PubMed
spelling	pubmed-99045262023-02-08 Text embedding techniques for efficient clustering of twitter data Ravi, Jayasree Kulkarni, Sushil Evol Intell Special Issue World wide web is abundant with various types of information such blogs, social media posts, news articles. With this type of magnitude of online content, there is a need to deeply understand the insights of it in order to make use of the information for practical applications such as event detection, polarity, sentiment analysis and so on. Natural Language Processing (NLP) is the study of such information which is used for text classification, sentiment analysis, clustering of similar text. NLP makes use of linguistic knowledge and build machine learning models to analyse textual information. NLP finds its way in various applications like classification of online review into positive and negative without actually reading the reviews and feedback. For text analysis, there should be a way to quantify the text based on its frequency of occurrence, correlation with neighbouring words, contextual similarity of words, etc. One such way is word embedding. This study applies various word embedding techniques on tweets of popular news channels and clusters the resultant vectors using K-means algorithm. From this study, it is found out that Bidirectional Encoder Representations from Transformers (BERT) has achieved highest accuracy rate when used with K-means clustering. Springer Berlin Heidelberg 2023-02-07 /pmc/articles/PMC9904526/ /pubmed/36777033 http://dx.doi.org/10.1007/s12065-023-00825-3 Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Special Issue Ravi, Jayasree Kulkarni, Sushil Text embedding techniques for efficient clustering of twitter data
title	Text embedding techniques for efficient clustering of twitter data
title_full	Text embedding techniques for efficient clustering of twitter data
title_fullStr	Text embedding techniques for efficient clustering of twitter data
title_full_unstemmed	Text embedding techniques for efficient clustering of twitter data
title_short	Text embedding techniques for efficient clustering of twitter data
title_sort	text embedding techniques for efficient clustering of twitter data
topic	Special Issue
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904526/ https://www.ncbi.nlm.nih.gov/pubmed/36777033 http://dx.doi.org/10.1007/s12065-023-00825-3
work_keys_str_mv	AT ravijayasree textembeddingtechniquesforefficientclusteringoftwitterdata AT kulkarnisushil textembeddingtechniquesforefficientclusteringoftwitterdata

Text embedding techniques for efficient clustering of twitter data

Ejemplares similares