Cargando…

Sentimental analysis from imbalanced code-mixed data using machine learning approaches

Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in senti...

Descripción completa

Detalles Bibliográficos
Autores principales: Srinivasan, R., Subalalitha, C. N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980744/
https://www.ncbi.nlm.nih.gov/pubmed/33776212
http://dx.doi.org/10.1007/s10619-021-07331-4
_version_ 1783667486010376192
author Srinivasan, R.
Subalalitha, C. N.
author_facet Srinivasan, R.
Subalalitha, C. N.
author_sort Srinivasan, R.
collection PubMed
description Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in sentimental analysis. Not much works focused on sentimental analysis with imbalanced class label distribution. The paper also focusses on another aspect of the problem which involves a concept called “Code Mixing”. Code mixed data consists of text alternating between two or more languages. Class imbalance distribution is a commonly noted phenomenon in a code-mixed data. The existing works have focused more on analyzing the sentiments in a monolingual data but not in a code-mixed data. This paper addresses all these issues and comes up with a solution to analyze sentiments for a class imbalanced code-mixed data using sampling technique combined with levenshtein distance metrics. Furthermore, this paper compares the performances of various machine learning approaches namely, Random Forest Classifier, Logistic Regression, XGBoost classifier, Support Vector Machine and Naïve Bayes Classifier using F1- Score.
format Online
Article
Text
id pubmed-7980744
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-79807442021-03-23 Sentimental analysis from imbalanced code-mixed data using machine learning approaches Srinivasan, R. Subalalitha, C. N. Distrib Parallel Databases Article Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in sentimental analysis. Not much works focused on sentimental analysis with imbalanced class label distribution. The paper also focusses on another aspect of the problem which involves a concept called “Code Mixing”. Code mixed data consists of text alternating between two or more languages. Class imbalance distribution is a commonly noted phenomenon in a code-mixed data. The existing works have focused more on analyzing the sentiments in a monolingual data but not in a code-mixed data. This paper addresses all these issues and comes up with a solution to analyze sentiments for a class imbalanced code-mixed data using sampling technique combined with levenshtein distance metrics. Furthermore, this paper compares the performances of various machine learning approaches namely, Random Forest Classifier, Logistic Regression, XGBoost classifier, Support Vector Machine and Naïve Bayes Classifier using F1- Score. Springer US 2021-03-20 2023 /pmc/articles/PMC7980744/ /pubmed/33776212 http://dx.doi.org/10.1007/s10619-021-07331-4 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Srinivasan, R.
Subalalitha, C. N.
Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title_full Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title_fullStr Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title_full_unstemmed Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title_short Sentimental analysis from imbalanced code-mixed data using machine learning approaches
title_sort sentimental analysis from imbalanced code-mixed data using machine learning approaches
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980744/
https://www.ncbi.nlm.nih.gov/pubmed/33776212
http://dx.doi.org/10.1007/s10619-021-07331-4
work_keys_str_mv AT srinivasanr sentimentalanalysisfromimbalancedcodemixeddatausingmachinelearningapproaches
AT subalalithacn sentimentalanalysisfromimbalancedcodemixeddatausingmachinelearningapproaches