Cargando…

Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews

People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Con...

Descripción completa

Detalles Bibliográficos
Autores principales: Chatterjee, Ishani, Zhou, Mengchu, Abusorrah, Abdullah, Sedraoui, Khaled, Alabdulwahab, Ahmed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700267/
https://www.ncbi.nlm.nih.gov/pubmed/34945950
http://dx.doi.org/10.3390/e23121645
_version_ 1784620715866062848
author Chatterjee, Ishani
Zhou, Mengchu
Abusorrah, Abdullah
Sedraoui, Khaled
Alabdulwahab, Ahmed
author_facet Chatterjee, Ishani
Zhou, Mengchu
Abusorrah, Abdullah
Sedraoui, Khaled
Alabdulwahab, Ahmed
author_sort Chatterjee, Ishani
collection PubMed
description People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.
format Online
Article
Text
id pubmed-8700267
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87002672021-12-24 Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews Chatterjee, Ishani Zhou, Mengchu Abusorrah, Abdullah Sedraoui, Khaled Alabdulwahab, Ahmed Entropy (Basel) Article People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms. MDPI 2021-12-07 /pmc/articles/PMC8700267/ /pubmed/34945950 http://dx.doi.org/10.3390/e23121645 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chatterjee, Ishani
Zhou, Mengchu
Abusorrah, Abdullah
Sedraoui, Khaled
Alabdulwahab, Ahmed
Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_full Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_fullStr Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_full_unstemmed Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_short Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews
title_sort statistics-based outlier detection and correction method for amazon customer reviews
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700267/
https://www.ncbi.nlm.nih.gov/pubmed/34945950
http://dx.doi.org/10.3390/e23121645
work_keys_str_mv AT chatterjeeishani statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT zhoumengchu statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT abusorrahabdullah statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT sedraouikhaled statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews
AT alabdulwahabahmed statisticsbasedoutlierdetectionandcorrectionmethodforamazoncustomerreviews