Cargando…

Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin

The popularity of online shopping is steadily increasing. At the same time, fake product reviews are published widely and have the potential to affect consumer purchasing behavior. In response, previous work has developed automated methods utilizing natural language processing approaches to detect f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Soldner, Felix, Kleinberg, Bennett, Johnson, Shane D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728858/ https://www.ncbi.nlm.nih.gov/pubmed/36477257 http://dx.doi.org/10.1371/journal.pone.0277869

_version_	1784845356900548608
author	Soldner, Felix Kleinberg, Bennett Johnson, Shane D.
author_facet	Soldner, Felix Kleinberg, Bennett Johnson, Shane D.
author_sort	Soldner, Felix
collection	PubMed
description	The popularity of online shopping is steadily increasing. At the same time, fake product reviews are published widely and have the potential to affect consumer purchasing behavior. In response, previous work has developed automated methods utilizing natural language processing approaches to detect fake product reviews. However, studies vary considerably in how well they succeed in detecting deceptive reviews, and the reasons for such differences are unclear. A contributing factor may be the multitude of strategies used to collect data, introducing potential confounds which affect detection performance. Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and product ownership (i.e., reviews written by individuals who own or do not own the reviewed product). In the present study, we investigate the effect of both confounds for fake review detection. Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity. Supervised learning analysis suggests that review veracity (60.26–69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19–74.17%), or with data-origin (84.44–86.94%) are easier to classify. Review veracity is most easily classified if confounded with product-ownership and data-origin combined (87.78–88.12%). These findings are moderated by review polarity. Overall, our findings suggest that detection accuracy may have been overestimated in previous studies, provide possible explanations as to why, and indicate how future studies might be designed to provide less biased estimates of detection accuracy.
format	Online Article Text
id	pubmed-9728858
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-97288582022-12-08 Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin Soldner, Felix Kleinberg, Bennett Johnson, Shane D. PLoS One Research Article The popularity of online shopping is steadily increasing. At the same time, fake product reviews are published widely and have the potential to affect consumer purchasing behavior. In response, previous work has developed automated methods utilizing natural language processing approaches to detect fake product reviews. However, studies vary considerably in how well they succeed in detecting deceptive reviews, and the reasons for such differences are unclear. A contributing factor may be the multitude of strategies used to collect data, introducing potential confounds which affect detection performance. Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and product ownership (i.e., reviews written by individuals who own or do not own the reviewed product). In the present study, we investigate the effect of both confounds for fake review detection. Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity. Supervised learning analysis suggests that review veracity (60.26–69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19–74.17%), or with data-origin (84.44–86.94%) are easier to classify. Review veracity is most easily classified if confounded with product-ownership and data-origin combined (87.78–88.12%). These findings are moderated by review polarity. Overall, our findings suggest that detection accuracy may have been overestimated in previous studies, provide possible explanations as to why, and indicate how future studies might be designed to provide less biased estimates of detection accuracy. Public Library of Science 2022-12-07 /pmc/articles/PMC9728858/ /pubmed/36477257 http://dx.doi.org/10.1371/journal.pone.0277869 Text en © 2022 Soldner et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Soldner, Felix Kleinberg, Bennett Johnson, Shane D. Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title	Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title_full	Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title_fullStr	Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title_full_unstemmed	Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title_short	Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin
title_sort	confounds and overestimations in fake review detection: experimentally controlling for product-ownership and data-origin
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9728858/ https://www.ncbi.nlm.nih.gov/pubmed/36477257 http://dx.doi.org/10.1371/journal.pone.0277869
work_keys_str_mv	AT soldnerfelix confoundsandoverestimationsinfakereviewdetectionexperimentallycontrollingforproductownershipanddataorigin AT kleinbergbennett confoundsandoverestimationsinfakereviewdetectionexperimentallycontrollingforproductownershipanddataorigin AT johnsonshaned confoundsandoverestimationsinfakereviewdetectionexperimentallycontrollingforproductownershipanddataorigin

Confounds and overestimations in fake review detection: Experimentally controlling for product-ownership and data-origin

Ejemplares similares