Cargando…

Comparing writing style feature-based classification methods for estimating user reputations in social media

In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features...

Descripción completa

Detalles Bibliográficos
Autor principal: Suh, Jong Hwan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775724/
https://www.ncbi.nlm.nih.gov/pubmed/27006870
http://dx.doi.org/10.1186/s40064-016-1841-1
_version_ 1782419047363641344
author Suh, Jong Hwan
author_facet Suh, Jong Hwan
author_sort Suh, Jong Hwan
collection PubMed
description In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners—C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)—and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea’s Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.
format Online
Article
Text
id pubmed-4775724
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-47757242016-03-22 Comparing writing style feature-based classification methods for estimating user reputations in social media Suh, Jong Hwan Springerplus Research In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners—C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)—and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea’s Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy. Springer International Publishing 2016-03-02 /pmc/articles/PMC4775724/ /pubmed/27006870 http://dx.doi.org/10.1186/s40064-016-1841-1 Text en © Suh. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Suh, Jong Hwan
Comparing writing style feature-based classification methods for estimating user reputations in social media
title Comparing writing style feature-based classification methods for estimating user reputations in social media
title_full Comparing writing style feature-based classification methods for estimating user reputations in social media
title_fullStr Comparing writing style feature-based classification methods for estimating user reputations in social media
title_full_unstemmed Comparing writing style feature-based classification methods for estimating user reputations in social media
title_short Comparing writing style feature-based classification methods for estimating user reputations in social media
title_sort comparing writing style feature-based classification methods for estimating user reputations in social media
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4775724/
https://www.ncbi.nlm.nih.gov/pubmed/27006870
http://dx.doi.org/10.1186/s40064-016-1841-1
work_keys_str_mv AT suhjonghwan comparingwritingstylefeaturebasedclassificationmethodsforestimatinguserreputationsinsocialmedia