Cargando…

A study on real-time low-quality content detection on Twitter from the users’ perspective

Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from th...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Weiling, Yeo, Chai Kiat, Lau, Chiew Tong, Lee, Bu Sung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549928/
https://www.ncbi.nlm.nih.gov/pubmed/28793347
http://dx.doi.org/10.1371/journal.pone.0182487
_version_ 1783256048119840768
author Chen, Weiling
Yeo, Chai Kiat
Lau, Chiew Tong
Lee, Bu Sung
author_facet Chen, Weiling
Yeo, Chai Kiat
Lau, Chiew Tong
Lee, Bu Sung
author_sort Chen, Weiling
collection PubMed
description Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content.
format Online
Article
Text
id pubmed-5549928
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55499282017-08-15 A study on real-time low-quality content detection on Twitter from the users’ perspective Chen, Weiling Yeo, Chai Kiat Lau, Chiew Tong Lee, Bu Sung PLoS One Research Article Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content. Public Library of Science 2017-08-09 /pmc/articles/PMC5549928/ /pubmed/28793347 http://dx.doi.org/10.1371/journal.pone.0182487 Text en © 2017 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chen, Weiling
Yeo, Chai Kiat
Lau, Chiew Tong
Lee, Bu Sung
A study on real-time low-quality content detection on Twitter from the users’ perspective
title A study on real-time low-quality content detection on Twitter from the users’ perspective
title_full A study on real-time low-quality content detection on Twitter from the users’ perspective
title_fullStr A study on real-time low-quality content detection on Twitter from the users’ perspective
title_full_unstemmed A study on real-time low-quality content detection on Twitter from the users’ perspective
title_short A study on real-time low-quality content detection on Twitter from the users’ perspective
title_sort study on real-time low-quality content detection on twitter from the users’ perspective
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549928/
https://www.ncbi.nlm.nih.gov/pubmed/28793347
http://dx.doi.org/10.1371/journal.pone.0182487
work_keys_str_mv AT chenweiling astudyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT yeochaikiat astudyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT lauchiewtong astudyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT leebusung astudyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT chenweiling studyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT yeochaikiat studyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT lauchiewtong studyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective
AT leebusung studyonrealtimelowqualitycontentdetectionontwitterfromtheusersperspective