Cargando…

Fast Multidimensional Subset Scan for Outbreak Detection and Characterization

OBJECTIVE: We present Multidimensional Subset Scan (MD-Scan), a new method for early outbreak detection and characterization using multivariate case data from individuals in a population. MD-Scan extends previous work on multivariate event detection by identifying the characteristics of the affected...

Descripción completa

Detalles Bibliográficos
Autores principales: Neill, Daniel B., Kumar, Tarun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692941/
_version_ 1782274691926327296
author Neill, Daniel B.
Kumar, Tarun
author_facet Neill, Daniel B.
Kumar, Tarun
author_sort Neill, Daniel B.
collection PubMed
description OBJECTIVE: We present Multidimensional Subset Scan (MD-Scan), a new method for early outbreak detection and characterization using multivariate case data from individuals in a population. MD-Scan extends previous work on multivariate event detection by identifying the characteristics of the affected subpopulation, and enables more timely and accurate detection while maintaining computational tractability. INTRODUCTION: The multivariate linear-time subset scan (MLTSS) [1] extends previous spatial and subset scanning methods [2–3] to achieve timely and accurate event detection in massive multivariate datasets, efficiently optimizing a likelihood ratio statistic over proximity-constrained subsets of locations and all subsets of the monitored data streams. However, some disease outbreaks may only affect a sub-population of the monitored population (age group, gender, individuals engaging in a specific high-risk behavior, etc.), and MLTSS is unable to use this additional information to enhance detection ability. METHODS: Rather than using the aggregate counts for each monitored location and data stream, we assume a set of multivariate data records representing each affected individual, with attributes such as date, home zip code, prodrome, gender, and age decile. MD-Scan jointly optimizes the likelihood ratio statistic over subsets of the values for each monitored attribute, identifying a space-time region (subset of locations and time steps) and subpopulation (including gender(s) and age groups) where the number of recent cases for a subset of the monitored prodromes is significantly higher than expected. To do so, the linear-time subset scanning property [3] is used to efficiently and exactly optimize over subsets of a given attribute, conditioned on the current subsets of all other attributes. MD-Scan then iterates over all attributes until convergence to a local optimum, and performs multiple random restarts to approach the global optimum. Additional constraints can be incorporated into each conditional optimization step, including spatial proximity, temporal contiguity, and connectedness. More details are provided in [4]. RESULTS: We evaluated MD-Scan using simulated disease outbreaks injected into real-world Emergency Department data from Allegheny County, PA. Each outbreak was assumed to differentially affect a specific sub-population (e.g. “adult females” or “children and the elderly”). MD-Scan achieved significantly earlier detection than MLTSS when the distribution of injected cases for the monitored attributes was sufficiently different from the background data, particularly when multiple attributes were affected or the inject was biased toward a less common attribute value. For simulated gender-specific and age-biased injects which affected only children and the elderly, MD-Scan detected over one day faster than MLTSS, and achieved 10% higher spatial accuracy. MD-Scan was also able to accurately identify the affected age and gender groups (Figure 1), while MLTSS does not characterize the affected subpopulation. Runtime of MD-Scan, while 9× slower than MLTSS, was still extremely fast, requiring an average of 4.15 seconds per day of data. CONCLUSIONS: Our results demonstrate that MD-Scan is able to accurately identify the subpopulation affected by an outbreak, as represented by a subset of values for each monitored attribute. Additionally, MD-Scan substantially improves timeliness and accuracy of detection for outbreaks which differentially affect a subset of the monitored population. Detection performance was further enhanced by incorporating additional constraints such as spatial proximity and graph connectivity into the iterative MD-Scan procedure. [Figure: see text]
format Online
Article
Text
id pubmed-3692941
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36929412013-06-26 Fast Multidimensional Subset Scan for Outbreak Detection and Characterization Neill, Daniel B. Kumar, Tarun Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: We present Multidimensional Subset Scan (MD-Scan), a new method for early outbreak detection and characterization using multivariate case data from individuals in a population. MD-Scan extends previous work on multivariate event detection by identifying the characteristics of the affected subpopulation, and enables more timely and accurate detection while maintaining computational tractability. INTRODUCTION: The multivariate linear-time subset scan (MLTSS) [1] extends previous spatial and subset scanning methods [2–3] to achieve timely and accurate event detection in massive multivariate datasets, efficiently optimizing a likelihood ratio statistic over proximity-constrained subsets of locations and all subsets of the monitored data streams. However, some disease outbreaks may only affect a sub-population of the monitored population (age group, gender, individuals engaging in a specific high-risk behavior, etc.), and MLTSS is unable to use this additional information to enhance detection ability. METHODS: Rather than using the aggregate counts for each monitored location and data stream, we assume a set of multivariate data records representing each affected individual, with attributes such as date, home zip code, prodrome, gender, and age decile. MD-Scan jointly optimizes the likelihood ratio statistic over subsets of the values for each monitored attribute, identifying a space-time region (subset of locations and time steps) and subpopulation (including gender(s) and age groups) where the number of recent cases for a subset of the monitored prodromes is significantly higher than expected. To do so, the linear-time subset scanning property [3] is used to efficiently and exactly optimize over subsets of a given attribute, conditioned on the current subsets of all other attributes. MD-Scan then iterates over all attributes until convergence to a local optimum, and performs multiple random restarts to approach the global optimum. Additional constraints can be incorporated into each conditional optimization step, including spatial proximity, temporal contiguity, and connectedness. More details are provided in [4]. RESULTS: We evaluated MD-Scan using simulated disease outbreaks injected into real-world Emergency Department data from Allegheny County, PA. Each outbreak was assumed to differentially affect a specific sub-population (e.g. “adult females” or “children and the elderly”). MD-Scan achieved significantly earlier detection than MLTSS when the distribution of injected cases for the monitored attributes was sufficiently different from the background data, particularly when multiple attributes were affected or the inject was biased toward a less common attribute value. For simulated gender-specific and age-biased injects which affected only children and the elderly, MD-Scan detected over one day faster than MLTSS, and achieved 10% higher spatial accuracy. MD-Scan was also able to accurately identify the affected age and gender groups (Figure 1), while MLTSS does not characterize the affected subpopulation. Runtime of MD-Scan, while 9× slower than MLTSS, was still extremely fast, requiring an average of 4.15 seconds per day of data. CONCLUSIONS: Our results demonstrate that MD-Scan is able to accurately identify the subpopulation affected by an outbreak, as represented by a subset of values for each monitored attribute. Additionally, MD-Scan substantially improves timeliness and accuracy of detection for outbreaks which differentially affect a subset of the monitored population. Detection performance was further enhanced by incorporating additional constraints such as spatial proximity and graph connectivity into the iterative MD-Scan procedure. [Figure: see text] University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692941/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Neill, Daniel B.
Kumar, Tarun
Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title_full Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title_fullStr Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title_full_unstemmed Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title_short Fast Multidimensional Subset Scan for Outbreak Detection and Characterization
title_sort fast multidimensional subset scan for outbreak detection and characterization
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692941/
work_keys_str_mv AT neilldanielb fastmultidimensionalsubsetscanforoutbreakdetectionandcharacterization
AT kumartarun fastmultidimensionalsubsetscanforoutbreakdetectionandcharacterization