Cargando…
FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data str...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915800/ https://www.ncbi.nlm.nih.gov/pubmed/33557367 http://dx.doi.org/10.3390/s21041080 |
_version_ | 1783657330689179648 |
---|---|
author | Park, Namuk Kim, Songkuk |
author_facet | Park, Namuk Kim, Songkuk |
author_sort | Park, Namuk |
collection | PubMed |
description | Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams. |
format | Online Article Text |
id | pubmed-7915800 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-79158002021-03-01 FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams Park, Namuk Kim, Songkuk Sensors (Basel) Article Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams. MDPI 2021-02-04 /pmc/articles/PMC7915800/ /pubmed/33557367 http://dx.doi.org/10.3390/s21041080 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Park, Namuk Kim, Songkuk FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title | FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title_full | FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title_fullStr | FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title_full_unstemmed | FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title_short | FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams |
title_sort | flexsketch: estimation of probability density for stationary and non-stationary data streams |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7915800/ https://www.ncbi.nlm.nih.gov/pubmed/33557367 http://dx.doi.org/10.3390/s21041080 |
work_keys_str_mv | AT parknamuk flexsketchestimationofprobabilitydensityforstationaryandnonstationarydatastreams AT kimsongkuk flexsketchestimationofprobabilitydensityforstationaryandnonstationarydatastreams |