Cargando…

Building test data from real outbreaks for evaluating detection algorithms

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Texier, Gaetan, Jackson, Michael L., Siwe, Leonel, Meynard, Jean-Baptiste, Deparis, Xavier, Chaudet, Herve
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5593515/ https://www.ncbi.nlm.nih.gov/pubmed/28863159 http://dx.doi.org/10.1371/journal.pone.0183992

_version_	1783263051264294912
author	Texier, Gaetan Jackson, Michael L. Siwe, Leonel Meynard, Jean-Baptiste Deparis, Xavier Chaudet, Herve
author_facet	Texier, Gaetan Jackson, Michael L. Siwe, Leonel Meynard, Jean-Baptiste Deparis, Xavier Chaudet, Herve
author_sort	Texier, Gaetan
collection	PubMed
description	Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
format	Online Article Text
id	pubmed-5593515
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-55935152017-09-15 Building test data from real outbreaks for evaluating detection algorithms Texier, Gaetan Jackson, Michael L. Siwe, Leonel Meynard, Jean-Baptiste Deparis, Xavier Chaudet, Herve PLoS One Research Article Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals. Public Library of Science 2017-09-01 /pmc/articles/PMC5593515/ /pubmed/28863159 http://dx.doi.org/10.1371/journal.pone.0183992 Text en © 2017 Texier et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Texier, Gaetan Jackson, Michael L. Siwe, Leonel Meynard, Jean-Baptiste Deparis, Xavier Chaudet, Herve Building test data from real outbreaks for evaluating detection algorithms
title	Building test data from real outbreaks for evaluating detection algorithms
title_full	Building test data from real outbreaks for evaluating detection algorithms
title_fullStr	Building test data from real outbreaks for evaluating detection algorithms
title_full_unstemmed	Building test data from real outbreaks for evaluating detection algorithms
title_short	Building test data from real outbreaks for evaluating detection algorithms
title_sort	building test data from real outbreaks for evaluating detection algorithms
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5593515/ https://www.ncbi.nlm.nih.gov/pubmed/28863159 http://dx.doi.org/10.1371/journal.pone.0183992
work_keys_str_mv	AT texiergaetan buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms AT jacksonmichaell buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms AT siweleonel buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms AT meynardjeanbaptiste buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms AT deparisxavier buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms AT chaudetherve buildingtestdatafromrealoutbreaksforevaluatingdetectionalgorithms

Building test data from real outbreaks for evaluating detection algorithms

Ejemplares similares