Cargando…

Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism

Pornographic and gambling websites become increasingly stubborn via disguising, misleading, blocking, and bypassing, which hinder the construction of a safe and healthy network environment. However, most traditional approaches conduct the detection process through a single aspect of these sites, whi...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yang, Zheng, Rongfeng, Zhou, Anmin, Liao, Shan, Liu, Liang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7411926/
https://www.ncbi.nlm.nih.gov/pubmed/32709067
http://dx.doi.org/10.3390/s20143989
_version_ 1783568491882741760
author Chen, Yang
Zheng, Rongfeng
Zhou, Anmin
Liao, Shan
Liu, Liang
author_facet Chen, Yang
Zheng, Rongfeng
Zhou, Anmin
Liao, Shan
Liu, Liang
author_sort Chen, Yang
collection PubMed
description Pornographic and gambling websites become increasingly stubborn via disguising, misleading, blocking, and bypassing, which hinder the construction of a safe and healthy network environment. However, most traditional approaches conduct the detection process through a single aspect of these sites, which would fail to handle the more intricate and challenging situations. To alleviate this problem, this study proposed an automatic detection system for porn and gambling websites based on visual and textual content using a decision mechanism (PG-VTDM). This system can be applied to the intelligent wireless router at home or school to realize the identification, blocking, and warning of ill-suited websites. First, Doc2Vec was employed to learn the textual features that can be used to represent the textual content in the hypertext markup language (HTML) source code of the websites. In addition, the traditional bag-of-visual-words (BoVW) was improved by introducing local spatial relationships of feature points for better representing the visual features of the website screenshot. Then, based on these two types of features, a text classifier and an image classifier were both trained. In the decision mechanism, a data fusion algorithm based on logistic regression (LR) was designed to obtain the final prediction result by measuring the contribution of the two classification results to the final category prediction. The efficiency of this proposed approach was substantiated via comparison experiments using gambling and porn website datasets crawled from the Internet. The proposed approach outperformed the approach based on a single feature and some state-of-the-art approaches, with accuracy, precision, and F-measure all over 99%.
format Online
Article
Text
id pubmed-7411926
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-74119262020-08-25 Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism Chen, Yang Zheng, Rongfeng Zhou, Anmin Liao, Shan Liu, Liang Sensors (Basel) Article Pornographic and gambling websites become increasingly stubborn via disguising, misleading, blocking, and bypassing, which hinder the construction of a safe and healthy network environment. However, most traditional approaches conduct the detection process through a single aspect of these sites, which would fail to handle the more intricate and challenging situations. To alleviate this problem, this study proposed an automatic detection system for porn and gambling websites based on visual and textual content using a decision mechanism (PG-VTDM). This system can be applied to the intelligent wireless router at home or school to realize the identification, blocking, and warning of ill-suited websites. First, Doc2Vec was employed to learn the textual features that can be used to represent the textual content in the hypertext markup language (HTML) source code of the websites. In addition, the traditional bag-of-visual-words (BoVW) was improved by introducing local spatial relationships of feature points for better representing the visual features of the website screenshot. Then, based on these two types of features, a text classifier and an image classifier were both trained. In the decision mechanism, a data fusion algorithm based on logistic regression (LR) was designed to obtain the final prediction result by measuring the contribution of the two classification results to the final category prediction. The efficiency of this proposed approach was substantiated via comparison experiments using gambling and porn website datasets crawled from the Internet. The proposed approach outperformed the approach based on a single feature and some state-of-the-art approaches, with accuracy, precision, and F-measure all over 99%. MDPI 2020-07-17 /pmc/articles/PMC7411926/ /pubmed/32709067 http://dx.doi.org/10.3390/s20143989 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chen, Yang
Zheng, Rongfeng
Zhou, Anmin
Liao, Shan
Liu, Liang
Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title_full Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title_fullStr Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title_full_unstemmed Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title_short Automatic Detection of Pornographic and Gambling Websites Based on Visual and Textual Content Using a Decision Mechanism
title_sort automatic detection of pornographic and gambling websites based on visual and textual content using a decision mechanism
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7411926/
https://www.ncbi.nlm.nih.gov/pubmed/32709067
http://dx.doi.org/10.3390/s20143989
work_keys_str_mv AT chenyang automaticdetectionofpornographicandgamblingwebsitesbasedonvisualandtextualcontentusingadecisionmechanism
AT zhengrongfeng automaticdetectionofpornographicandgamblingwebsitesbasedonvisualandtextualcontentusingadecisionmechanism
AT zhouanmin automaticdetectionofpornographicandgamblingwebsitesbasedonvisualandtextualcontentusingadecisionmechanism
AT liaoshan automaticdetectionofpornographicandgamblingwebsitesbasedonvisualandtextualcontentusingadecisionmechanism
AT liuliang automaticdetectionofpornographicandgamblingwebsitesbasedonvisualandtextualcontentusingadecisionmechanism