Cargando…

A Pattern Dictionary Method for Anomaly Detection

In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a tes...

Descripción completa

Detalles Bibliográficos
Autores principales: Sabeti, Elyas, Oh, Sehong, Song, Peter X. K., Hero, Alfred O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407188/
https://www.ncbi.nlm.nih.gov/pubmed/36010758
http://dx.doi.org/10.3390/e24081095
_version_ 1784774303609257984
author Sabeti, Elyas
Oh, Sehong
Song, Peter X. K.
Hero, Alfred O.
author_facet Sabeti, Elyas
Oh, Sehong
Song, Peter X. K.
Hero, Alfred O.
author_sort Sabeti, Elyas
collection PubMed
description In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel–Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected.
format Online
Article
Text
id pubmed-9407188
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94071882022-08-26 A Pattern Dictionary Method for Anomaly Detection Sabeti, Elyas Oh, Sehong Song, Peter X. K. Hero, Alfred O. Entropy (Basel) Article In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel–Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected. MDPI 2022-08-09 /pmc/articles/PMC9407188/ /pubmed/36010758 http://dx.doi.org/10.3390/e24081095 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Sabeti, Elyas
Oh, Sehong
Song, Peter X. K.
Hero, Alfred O.
A Pattern Dictionary Method for Anomaly Detection
title A Pattern Dictionary Method for Anomaly Detection
title_full A Pattern Dictionary Method for Anomaly Detection
title_fullStr A Pattern Dictionary Method for Anomaly Detection
title_full_unstemmed A Pattern Dictionary Method for Anomaly Detection
title_short A Pattern Dictionary Method for Anomaly Detection
title_sort pattern dictionary method for anomaly detection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407188/
https://www.ncbi.nlm.nih.gov/pubmed/36010758
http://dx.doi.org/10.3390/e24081095
work_keys_str_mv AT sabetielyas apatterndictionarymethodforanomalydetection
AT ohsehong apatterndictionarymethodforanomalydetection
AT songpeterxk apatterndictionarymethodforanomalydetection
AT heroalfredo apatterndictionarymethodforanomalydetection
AT sabetielyas patterndictionarymethodforanomalydetection
AT ohsehong patterndictionarymethodforanomalydetection
AT songpeterxk patterndictionarymethodforanomalydetection
AT heroalfredo patterndictionarymethodforanomalydetection