Cargando…
Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data
Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of soc...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer London
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10165589/ https://www.ncbi.nlm.nih.gov/pubmed/37362579 http://dx.doi.org/10.1007/s00521-023-08629-3 |
_version_ | 1785038296176394240 |
---|---|
author | Manias, George Mavrogiorgou, Argyro Kiourtis, Athanasios Symvoulidis, Chrysostomos Kyriazis, Dimosthenis |
author_facet | Manias, George Mavrogiorgou, Argyro Kiourtis, Athanasios Symvoulidis, Chrysostomos Kyriazis, Dimosthenis |
author_sort | Manias, George |
collection | PubMed |
description | Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users’ posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers. |
format | Online Article Text |
id | pubmed-10165589 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer London |
record_format | MEDLINE/PubMed |
spelling | pubmed-101655892023-05-09 Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data Manias, George Mavrogiorgou, Argyro Kiourtis, Athanasios Symvoulidis, Chrysostomos Kyriazis, Dimosthenis Neural Comput Appl S.I.: Technologies of the 4th Industrial Revolution with applications Text categorization and sentiment analysis are two of the most typical natural language processing tasks with various emerging applications implemented and utilized in different domains, such as health care and policy making. At the same time, the tremendous growth in the popularity and usage of social media, such as Twitter, has resulted on an immense increase in user-generated data, as mainly represented by the corresponding texts in users’ posts. However, the analysis of these specific data and the extraction of actionable knowledge and added value out of them is a challenging task due to the domain diversity and the high multilingualism that characterizes these data. The latter highlights the emerging need for the implementation and utilization of domain-agnostic and multilingual solutions. To investigate a portion of these challenges this research work performs a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus. In this context, four multilingual BERT-based classifiers and a zero-shot classification approach are utilized and compared in terms of their accuracy and applicability in the classification of multilingual data. Their comparison has unveiled insightful outcomes and has a twofold interpretation. Multilingual BERT-based classifiers achieve high performances and transfer inference when trained and fine-tuned on multilingual data. While also the zero-shot approach presents a novel technique for creating multilingual solutions in a faster, more efficient, and scalable way. It can easily be fitted to new languages and new tasks while achieving relatively good results across many languages. However, when efficiency and scalability are less important than accuracy, it seems that this model, and zero-shot models in general, can not be compared to fine-tuned and trained multilingual BERT-based classifiers. Springer London 2023-05-08 /pmc/articles/PMC10165589/ /pubmed/37362579 http://dx.doi.org/10.1007/s00521-023-08629-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | S.I.: Technologies of the 4th Industrial Revolution with applications Manias, George Mavrogiorgou, Argyro Kiourtis, Athanasios Symvoulidis, Chrysostomos Kyriazis, Dimosthenis Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title | Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title_full | Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title_fullStr | Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title_full_unstemmed | Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title_short | Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
title_sort | multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data |
topic | S.I.: Technologies of the 4th Industrial Revolution with applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10165589/ https://www.ncbi.nlm.nih.gov/pubmed/37362579 http://dx.doi.org/10.1007/s00521-023-08629-3 |
work_keys_str_mv | AT maniasgeorge multilingualtextcategorizationandsentimentanalysisacomparativeanalysisoftheutilizationofmultilingualapproachesforclassifyingtwitterdata AT mavrogiorgouargyro multilingualtextcategorizationandsentimentanalysisacomparativeanalysisoftheutilizationofmultilingualapproachesforclassifyingtwitterdata AT kiourtisathanasios multilingualtextcategorizationandsentimentanalysisacomparativeanalysisoftheutilizationofmultilingualapproachesforclassifyingtwitterdata AT symvoulidischrysostomos multilingualtextcategorizationandsentimentanalysisacomparativeanalysisoftheutilizationofmultilingualapproachesforclassifyingtwitterdata AT kyriazisdimosthenis multilingualtextcategorizationandsentimentanalysisacomparativeanalysisoftheutilizationofmultilingualapproachesforclassifyingtwitterdata |