Cargando…

FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams

Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data str...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Namuk, Kim, Songkuk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915800/
https://www.ncbi.nlm.nih.gov/pubmed/33557367
http://dx.doi.org/10.3390/s21041080
_version_ 1783657330689179648
author Park, Namuk
Kim, Songkuk
author_facet Park, Namuk
Kim, Songkuk
author_sort Park, Namuk
collection PubMed
description Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.
format Online
Article
Text
id pubmed-7915800
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79158002021-03-01 FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams Park, Namuk Kim, Songkuk Sensors (Basel) Article Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams. MDPI 2021-02-04 /pmc/articles/PMC7915800/ /pubmed/33557367 http://dx.doi.org/10.3390/s21041080 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Park, Namuk
Kim, Songkuk
FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title_full FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title_fullStr FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title_full_unstemmed FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title_short FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
title_sort flexsketch: estimation of probability density for stationary and non-stationary data streams
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915800/
https://www.ncbi.nlm.nih.gov/pubmed/33557367
http://dx.doi.org/10.3390/s21041080
work_keys_str_mv AT parknamuk flexsketchestimationofprobabilitydensityforstationaryandnonstationarydatastreams
AT kimsongkuk flexsketchestimationofprobabilitydensityforstationaryandnonstationarydatastreams