Cargando…

Searching for Complex Patterns Using Disjunctive Anomaly Detection

OBJECTIVE: Disjunctive anomaly detection (DAD) algorithm [1] can efficiently search across multidimensional biosurveillance data to find multiple simultaneously occurring (in time) and overlapping (across different data dimensions) anomalous clusters. We introduce extensions of DAD to handle rich cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Sabhnani, Maheshkumar, Dubrawski, Artur, Schneider, Jeff
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692787/
_version_ 1782274655186321408
author Sabhnani, Maheshkumar
Dubrawski, Artur
Schneider, Jeff
author_facet Sabhnani, Maheshkumar
Dubrawski, Artur
Schneider, Jeff
author_sort Sabhnani, Maheshkumar
collection PubMed
description OBJECTIVE: Disjunctive anomaly detection (DAD) algorithm [1] can efficiently search across multidimensional biosurveillance data to find multiple simultaneously occurring (in time) and overlapping (across different data dimensions) anomalous clusters. We introduce extensions of DAD to handle rich cluster interactions and diverse data distributions. INTRODUCTION: Modern biosurveillance data contains thousands of unique time series defined across various categorical dimensions (zipcode, age groups, hospitals). Many algorithms are overly specific (tracking each time series independently would often miss early signs of outbreaks), or too general (detections at state level may lack specificity reflective of the actual process at hand). Disease outbreaks often impact multiple values (disjunctive sets of zipcodes, hospitals, multiple age groups) along subsets of multiple dimensions of data. It is not uncommon to see outbreaks of different diseases occurring simultaneously (e.g. food poisoning and flu) making it hard to detect and characterize the individual events. We proposed Disjunctive Anomaly Detection (DAD) algorithm [1] to efficiently search across millions of potential clusters defined as conjunctions over dimensions and disjunctions over values along each dimension. An example anomalous cluster detectable by DAD may identify zipcode = {z1 or z2 or z3 or z5} and age_group = {child or senior} to show unusual activity in the aggregate. Such conjunctive-disjunctive language of cluster definitions enables finding real-world outbreaks that are often missed by other state-of-art algorithms like What’s Strange About Recent Events (WSARE) [3] or Large Average Submatrix (LAS) [2]. DAD is able to identify multiple interesting clusters simultaneously and better explain complex anomalies in data than those alternatives. METHODS: We define the observed counts of patients reporting on a given day as a random variable for each unique combination of values along all dimensions. DAD iteratively identifies K subsets of these variables along with corresponding ranges of their values and time intervals that show increased activity that cannot be explained by random fluctuations (K is generally unknown and could be 0). The resulting set of clusters maximizes data likelihood while controlling for overall complexity. We have successfully derived a versatile set of scoring functions that allow Normal, Poisson, Exponential or Non-parametric assumptions about the underlying data distributions, and accommodate additive-scaled, additive-unscaled or multiplicative-scaled models for the clusters. RESULTS: We present results of testing DAD on two real-world datasets. One of them contains daily outpatient visit counts from 26 regions in Sri Lanka involving 9 common diseases. The other data contains semi-synthetically generated terrorist activities throughout regions of Afghanistan (Sigacts). Both span multiple years and are representative of data seen in biosurveillance applications. Figure 1 shows DAD systematically outperforming WSARE and LAS. Each algorithm’s parameters were tuned to generate one false positive per month in baseline data. The graphs represent average days-to-detect performance of 100 sets with synthetically injected clusters using additive-scaled (AS), additive-unscaled (AU), and multiplicative-scaled (MS) models of cluster interactions. CONCLUSIONS: We extend applicability of DAD algorithm to handle wide variety of input data distributions and various outbreak models. DAD efficiently scans over millions of potential outbreak patterns and accurately and timely reports complex outbreak interactions with speed that meets requirements of practical applications.
format Online
Article
Text
id pubmed-3692787
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36927872013-06-26 Searching for Complex Patterns Using Disjunctive Anomaly Detection Sabhnani, Maheshkumar Dubrawski, Artur Schneider, Jeff Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: Disjunctive anomaly detection (DAD) algorithm [1] can efficiently search across multidimensional biosurveillance data to find multiple simultaneously occurring (in time) and overlapping (across different data dimensions) anomalous clusters. We introduce extensions of DAD to handle rich cluster interactions and diverse data distributions. INTRODUCTION: Modern biosurveillance data contains thousands of unique time series defined across various categorical dimensions (zipcode, age groups, hospitals). Many algorithms are overly specific (tracking each time series independently would often miss early signs of outbreaks), or too general (detections at state level may lack specificity reflective of the actual process at hand). Disease outbreaks often impact multiple values (disjunctive sets of zipcodes, hospitals, multiple age groups) along subsets of multiple dimensions of data. It is not uncommon to see outbreaks of different diseases occurring simultaneously (e.g. food poisoning and flu) making it hard to detect and characterize the individual events. We proposed Disjunctive Anomaly Detection (DAD) algorithm [1] to efficiently search across millions of potential clusters defined as conjunctions over dimensions and disjunctions over values along each dimension. An example anomalous cluster detectable by DAD may identify zipcode = {z1 or z2 or z3 or z5} and age_group = {child or senior} to show unusual activity in the aggregate. Such conjunctive-disjunctive language of cluster definitions enables finding real-world outbreaks that are often missed by other state-of-art algorithms like What’s Strange About Recent Events (WSARE) [3] or Large Average Submatrix (LAS) [2]. DAD is able to identify multiple interesting clusters simultaneously and better explain complex anomalies in data than those alternatives. METHODS: We define the observed counts of patients reporting on a given day as a random variable for each unique combination of values along all dimensions. DAD iteratively identifies K subsets of these variables along with corresponding ranges of their values and time intervals that show increased activity that cannot be explained by random fluctuations (K is generally unknown and could be 0). The resulting set of clusters maximizes data likelihood while controlling for overall complexity. We have successfully derived a versatile set of scoring functions that allow Normal, Poisson, Exponential or Non-parametric assumptions about the underlying data distributions, and accommodate additive-scaled, additive-unscaled or multiplicative-scaled models for the clusters. RESULTS: We present results of testing DAD on two real-world datasets. One of them contains daily outpatient visit counts from 26 regions in Sri Lanka involving 9 common diseases. The other data contains semi-synthetically generated terrorist activities throughout regions of Afghanistan (Sigacts). Both span multiple years and are representative of data seen in biosurveillance applications. Figure 1 shows DAD systematically outperforming WSARE and LAS. Each algorithm’s parameters were tuned to generate one false positive per month in baseline data. The graphs represent average days-to-detect performance of 100 sets with synthetically injected clusters using additive-scaled (AS), additive-unscaled (AU), and multiplicative-scaled (MS) models of cluster interactions. CONCLUSIONS: We extend applicability of DAD algorithm to handle wide variety of input data distributions and various outbreak models. DAD efficiently scans over millions of potential outbreak patterns and accurately and timely reports complex outbreak interactions with speed that meets requirements of practical applications. University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692787/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Sabhnani, Maheshkumar
Dubrawski, Artur
Schneider, Jeff
Searching for Complex Patterns Using Disjunctive Anomaly Detection
title Searching for Complex Patterns Using Disjunctive Anomaly Detection
title_full Searching for Complex Patterns Using Disjunctive Anomaly Detection
title_fullStr Searching for Complex Patterns Using Disjunctive Anomaly Detection
title_full_unstemmed Searching for Complex Patterns Using Disjunctive Anomaly Detection
title_short Searching for Complex Patterns Using Disjunctive Anomaly Detection
title_sort searching for complex patterns using disjunctive anomaly detection
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692787/
work_keys_str_mv AT sabhnanimaheshkumar searchingforcomplexpatternsusingdisjunctiveanomalydetection
AT dubrawskiartur searchingforcomplexpatternsusingdisjunctiveanomalydetection
AT schneiderjeff searchingforcomplexpatternsusingdisjunctiveanomalydetection