Cargando…

Stratification-Based Outlier Detection over the Deep Web

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the...

Descripción completa

Detalles Bibliográficos
Autores principales: Xian, Xuefeng, Zhao, Pengpeng, Sheng, Victor S., Fang, Ligang, Gu, Caidong, Yang, Yuanfeng, Cui, Zhiming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4897663/
https://www.ncbi.nlm.nih.gov/pubmed/27313603
http://dx.doi.org/10.1155/2016/7386517
_version_ 1782436209550688256
author Xian, Xuefeng
Zhao, Pengpeng
Sheng, Victor S.
Fang, Ligang
Gu, Caidong
Yang, Yuanfeng
Cui, Zhiming
author_facet Xian, Xuefeng
Zhao, Pengpeng
Sheng, Victor S.
Fang, Ligang
Gu, Caidong
Yang, Yuanfeng
Cui, Zhiming
author_sort Xian, Xuefeng
collection PubMed
description For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
format Online
Article
Text
id pubmed-4897663
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-48976632016-06-16 Stratification-Based Outlier Detection over the Deep Web Xian, Xuefeng Zhao, Pengpeng Sheng, Victor S. Fang, Ligang Gu, Caidong Yang, Yuanfeng Cui, Zhiming Comput Intell Neurosci Research Article For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. Hindawi Publishing Corporation 2016 2016-05-25 /pmc/articles/PMC4897663/ /pubmed/27313603 http://dx.doi.org/10.1155/2016/7386517 Text en Copyright © 2016 Xuefeng Xian et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xian, Xuefeng
Zhao, Pengpeng
Sheng, Victor S.
Fang, Ligang
Gu, Caidong
Yang, Yuanfeng
Cui, Zhiming
Stratification-Based Outlier Detection over the Deep Web
title Stratification-Based Outlier Detection over the Deep Web
title_full Stratification-Based Outlier Detection over the Deep Web
title_fullStr Stratification-Based Outlier Detection over the Deep Web
title_full_unstemmed Stratification-Based Outlier Detection over the Deep Web
title_short Stratification-Based Outlier Detection over the Deep Web
title_sort stratification-based outlier detection over the deep web
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4897663/
https://www.ncbi.nlm.nih.gov/pubmed/27313603
http://dx.doi.org/10.1155/2016/7386517
work_keys_str_mv AT xianxuefeng stratificationbasedoutlierdetectionoverthedeepweb
AT zhaopengpeng stratificationbasedoutlierdetectionoverthedeepweb
AT shengvictors stratificationbasedoutlierdetectionoverthedeepweb
AT fangligang stratificationbasedoutlierdetectionoverthedeepweb
AT gucaidong stratificationbasedoutlierdetectionoverthedeepweb
AT yangyuanfeng stratificationbasedoutlierdetectionoverthedeepweb
AT cuizhiming stratificationbasedoutlierdetectionoverthedeepweb