Cargando…

Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings

Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers’ opinions about products assists to maintain their competitive conditions. We intend to analyze the users’ opinions on the website of the most immense online store in Iran; Digik...

Descripción completa

Detalles Bibliográficos
Autores principales: Shumaly, Sajjad, Yazdinejad, Mohsen, Guo, Yanhui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959661/
https://www.ncbi.nlm.nih.gov/pubmed/33817057
http://dx.doi.org/10.7717/peerj-cs.422
_version_ 1783664997526667264
author Shumaly, Sajjad
Yazdinejad, Mohsen
Guo, Yanhui
author_facet Shumaly, Sajjad
Yazdinejad, Mohsen
Guo, Yanhui
author_sort Shumaly, Sajjad
collection PubMed
description Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers’ opinions about products assists to maintain their competitive conditions. We intend to analyze the users’ opinions on the website of the most immense online store in Iran; Digikala. However, the Persian language is unstructured which makes the pre-processing stage very difficult and it is the main problem of sentiment analysis in Persian. What exacerbates this problem is the lack of available libraries for Persian pre-processing, while most libraries focus on English. To tackle this, approximately 3 million reviews were gathered in Persian from the Digikala website using web-mining techniques, and the fastText method was used to create a word embedding. It was assumed that this would dramatically cut down on the need for text pre-processing through the skip-gram method considering the position of the words in the sentence and the words’ relations to each other. Another word embedding has been created using the TF-IDF in parallel with fastText to compare their performance. In addition, the results of the Convolutional Neural Network (CNN), BiLSTM, Logistic Regression, and Naïve Bayes models have been compared. As a significant result, we obtained 0.996 AUC and 0.956 F-score using fastText and CNN. In this article, not only has it been demonstrated to what extent it is possible to be independent of pre-processing but also the accuracy obtained is better than other researches done in Persian. Avoiding complex text preprocessing is also important for other languages since most text preprocessing algorithms have been developed for English and cannot be used for other languages. The created word embedding due to its high accuracy and independence of pre-processing has other applications in Persian besides sentiment analysis.
format Online
Article
Text
id pubmed-7959661
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79596612021-04-02 Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings Shumaly, Sajjad Yazdinejad, Mohsen Guo, Yanhui PeerJ Comput Sci Artificial Intelligence Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers’ opinions about products assists to maintain their competitive conditions. We intend to analyze the users’ opinions on the website of the most immense online store in Iran; Digikala. However, the Persian language is unstructured which makes the pre-processing stage very difficult and it is the main problem of sentiment analysis in Persian. What exacerbates this problem is the lack of available libraries for Persian pre-processing, while most libraries focus on English. To tackle this, approximately 3 million reviews were gathered in Persian from the Digikala website using web-mining techniques, and the fastText method was used to create a word embedding. It was assumed that this would dramatically cut down on the need for text pre-processing through the skip-gram method considering the position of the words in the sentence and the words’ relations to each other. Another word embedding has been created using the TF-IDF in parallel with fastText to compare their performance. In addition, the results of the Convolutional Neural Network (CNN), BiLSTM, Logistic Regression, and Naïve Bayes models have been compared. As a significant result, we obtained 0.996 AUC and 0.956 F-score using fastText and CNN. In this article, not only has it been demonstrated to what extent it is possible to be independent of pre-processing but also the accuracy obtained is better than other researches done in Persian. Avoiding complex text preprocessing is also important for other languages since most text preprocessing algorithms have been developed for English and cannot be used for other languages. The created word embedding due to its high accuracy and independence of pre-processing has other applications in Persian besides sentiment analysis. PeerJ Inc. 2021-03-05 /pmc/articles/PMC7959661/ /pubmed/33817057 http://dx.doi.org/10.7717/peerj-cs.422 Text en © 2021 Shumaly et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Shumaly, Sajjad
Yazdinejad, Mohsen
Guo, Yanhui
Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title_full Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title_fullStr Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title_full_unstemmed Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title_short Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings
title_sort persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fasttext embeddings
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959661/
https://www.ncbi.nlm.nih.gov/pubmed/33817057
http://dx.doi.org/10.7717/peerj-cs.422
work_keys_str_mv AT shumalysajjad persiansentimentanalysisofanonlinestoreindependentofpreprocessingusingconvolutionalneuralnetworkwithfasttextembeddings
AT yazdinejadmohsen persiansentimentanalysisofanonlinestoreindependentofpreprocessingusingconvolutionalneuralnetworkwithfasttextembeddings
AT guoyanhui persiansentimentanalysisofanonlinestoreindependentofpreprocessingusingconvolutionalneuralnetworkwithfasttextembeddings