Cargando…

Social media text analytics of Malayalam–English code-mixed using deep learning

Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Ide...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thara, S., Poornachandran, Prabaharan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041283/ https://www.ncbi.nlm.nih.gov/pubmed/35495077 http://dx.doi.org/10.1186/s40537-022-00594-3

_version_	1784694513276551168
author	Thara, S. Poornachandran, Prabaharan
author_facet	Thara, S. Poornachandran, Prabaharan
author_sort	Thara, S.
collection	PubMed
description	Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model’s accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good.
format	Online Article Text
id	pubmed-9041283
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-90412832022-04-27 Social media text analytics of Malayalam–English code-mixed using deep learning Thara, S. Poornachandran, Prabaharan J Big Data Research Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model’s accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good. Springer International Publishing 2022-04-26 2022 /pmc/articles/PMC9041283/ /pubmed/35495077 http://dx.doi.org/10.1186/s40537-022-00594-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Research Thara, S. Poornachandran, Prabaharan Social media text analytics of Malayalam–English code-mixed using deep learning
title	Social media text analytics of Malayalam–English code-mixed using deep learning
title_full	Social media text analytics of Malayalam–English code-mixed using deep learning
title_fullStr	Social media text analytics of Malayalam–English code-mixed using deep learning
title_full_unstemmed	Social media text analytics of Malayalam–English code-mixed using deep learning
title_short	Social media text analytics of Malayalam–English code-mixed using deep learning
title_sort	social media text analytics of malayalam–english code-mixed using deep learning
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9041283/ https://www.ncbi.nlm.nih.gov/pubmed/35495077 http://dx.doi.org/10.1186/s40537-022-00594-3
work_keys_str_mv	AT tharas socialmediatextanalyticsofmalayalamenglishcodemixedusingdeeplearning AT poornachandranprabaharan socialmediatextanalyticsofmalayalamenglishcodemixedusingdeeplearning

Social media text analytics of Malayalam–English code-mixed using deep learning

Ejemplares similares