Cargando…

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is kn...

Descripción completa

Detalles Bibliográficos
Autores principales: Goldstein, Markus, Uchida, Seiichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4836738/
https://www.ncbi.nlm.nih.gov/pubmed/27093601
http://dx.doi.org/10.1371/journal.pone.0152173
_version_ 1782427774623940608
author Goldstein, Markus
Uchida, Seiichi
author_facet Goldstein, Markus
Uchida, Seiichi
author_sort Goldstein, Markus
collection PubMed
description Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.
format Online
Article
Text
id pubmed-4836738
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48367382016-04-29 A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data Goldstein, Markus Uchida, Seiichi PLoS One Research Article Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks. Public Library of Science 2016-04-19 /pmc/articles/PMC4836738/ /pubmed/27093601 http://dx.doi.org/10.1371/journal.pone.0152173 Text en © 2016 Goldstein, Uchida http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Goldstein, Markus
Uchida, Seiichi
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title_full A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title_fullStr A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title_full_unstemmed A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title_short A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
title_sort comparative evaluation of unsupervised anomaly detection algorithms for multivariate data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4836738/
https://www.ncbi.nlm.nih.gov/pubmed/27093601
http://dx.doi.org/10.1371/journal.pone.0152173
work_keys_str_mv AT goldsteinmarkus acomparativeevaluationofunsupervisedanomalydetectionalgorithmsformultivariatedata
AT uchidaseiichi acomparativeevaluationofunsupervisedanomalydetectionalgorithmsformultivariatedata
AT goldsteinmarkus comparativeevaluationofunsupervisedanomalydetectionalgorithmsformultivariatedata
AT uchidaseiichi comparativeevaluationofunsupervisedanomalydetectionalgorithmsformultivariatedata