Cargando…
Sentimental analysis from imbalanced code-mixed data using machine learning approaches
Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in senti...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980744/ https://www.ncbi.nlm.nih.gov/pubmed/33776212 http://dx.doi.org/10.1007/s10619-021-07331-4 |
_version_ | 1783667486010376192 |
---|---|
author | Srinivasan, R. Subalalitha, C. N. |
author_facet | Srinivasan, R. Subalalitha, C. N. |
author_sort | Srinivasan, R. |
collection | PubMed |
description | Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in sentimental analysis. Not much works focused on sentimental analysis with imbalanced class label distribution. The paper also focusses on another aspect of the problem which involves a concept called “Code Mixing”. Code mixed data consists of text alternating between two or more languages. Class imbalance distribution is a commonly noted phenomenon in a code-mixed data. The existing works have focused more on analyzing the sentiments in a monolingual data but not in a code-mixed data. This paper addresses all these issues and comes up with a solution to analyze sentiments for a class imbalanced code-mixed data using sampling technique combined with levenshtein distance metrics. Furthermore, this paper compares the performances of various machine learning approaches namely, Random Forest Classifier, Logistic Regression, XGBoost classifier, Support Vector Machine and Naïve Bayes Classifier using F1- Score. |
format | Online Article Text |
id | pubmed-7980744 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-79807442021-03-23 Sentimental analysis from imbalanced code-mixed data using machine learning approaches Srinivasan, R. Subalalitha, C. N. Distrib Parallel Databases Article Knowledge discovery from various perspectives has become a crucial asset in almost all fields. Sentimental analysis is a classification task used to classify the sentence based on the meaning of their context. This paper addresses class imbalance problem which is one of the important issues in sentimental analysis. Not much works focused on sentimental analysis with imbalanced class label distribution. The paper also focusses on another aspect of the problem which involves a concept called “Code Mixing”. Code mixed data consists of text alternating between two or more languages. Class imbalance distribution is a commonly noted phenomenon in a code-mixed data. The existing works have focused more on analyzing the sentiments in a monolingual data but not in a code-mixed data. This paper addresses all these issues and comes up with a solution to analyze sentiments for a class imbalanced code-mixed data using sampling technique combined with levenshtein distance metrics. Furthermore, this paper compares the performances of various machine learning approaches namely, Random Forest Classifier, Logistic Regression, XGBoost classifier, Support Vector Machine and Naïve Bayes Classifier using F1- Score. Springer US 2021-03-20 2023 /pmc/articles/PMC7980744/ /pubmed/33776212 http://dx.doi.org/10.1007/s10619-021-07331-4 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Srinivasan, R. Subalalitha, C. N. Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title | Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title_full | Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title_fullStr | Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title_full_unstemmed | Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title_short | Sentimental analysis from imbalanced code-mixed data using machine learning approaches |
title_sort | sentimental analysis from imbalanced code-mixed data using machine learning approaches |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980744/ https://www.ncbi.nlm.nih.gov/pubmed/33776212 http://dx.doi.org/10.1007/s10619-021-07331-4 |
work_keys_str_mv | AT srinivasanr sentimentalanalysisfromimbalancedcodemixeddatausingmachinelearningapproaches AT subalalithacn sentimentalanalysisfromimbalancedcodemixeddatausingmachinelearningapproaches |