Cargando…
A Pattern Dictionary Method for Anomaly Detection
In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a tes...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407188/ https://www.ncbi.nlm.nih.gov/pubmed/36010758 http://dx.doi.org/10.3390/e24081095 |
_version_ | 1784774303609257984 |
---|---|
author | Sabeti, Elyas Oh, Sehong Song, Peter X. K. Hero, Alfred O. |
author_facet | Sabeti, Elyas Oh, Sehong Song, Peter X. K. Hero, Alfred O. |
author_sort | Sabeti, Elyas |
collection | PubMed |
description | In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel–Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected. |
format | Online Article Text |
id | pubmed-9407188 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94071882022-08-26 A Pattern Dictionary Method for Anomaly Detection Sabeti, Elyas Oh, Sehong Song, Peter X. K. Hero, Alfred O. Entropy (Basel) Article In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel–Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected. MDPI 2022-08-09 /pmc/articles/PMC9407188/ /pubmed/36010758 http://dx.doi.org/10.3390/e24081095 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Sabeti, Elyas Oh, Sehong Song, Peter X. K. Hero, Alfred O. A Pattern Dictionary Method for Anomaly Detection |
title | A Pattern Dictionary Method for Anomaly Detection |
title_full | A Pattern Dictionary Method for Anomaly Detection |
title_fullStr | A Pattern Dictionary Method for Anomaly Detection |
title_full_unstemmed | A Pattern Dictionary Method for Anomaly Detection |
title_short | A Pattern Dictionary Method for Anomaly Detection |
title_sort | pattern dictionary method for anomaly detection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407188/ https://www.ncbi.nlm.nih.gov/pubmed/36010758 http://dx.doi.org/10.3390/e24081095 |
work_keys_str_mv | AT sabetielyas apatterndictionarymethodforanomalydetection AT ohsehong apatterndictionarymethodforanomalydetection AT songpeterxk apatterndictionarymethodforanomalydetection AT heroalfredo apatterndictionarymethodforanomalydetection AT sabetielyas patterndictionarymethodforanomalydetection AT ohsehong patterndictionarymethodforanomalydetection AT songpeterxk patterndictionarymethodforanomalydetection AT heroalfredo patterndictionarymethodforanomalydetection |