Cargando…

OnlineStats.jl: A Julia package for statistics on data streams

The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Str...

Descripción completa

Detalles Bibliográficos
Autores principales: Day, Josh, Zhou, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7286575/
https://www.ncbi.nlm.nih.gov/pubmed/32524061
http://dx.doi.org/10.21105/joss.01816
_version_ 1783544901205491712
author Day, Josh
Zhou, Hua
author_facet Day, Josh
Zhou, Hua
author_sort Day, Josh
collection PubMed
description The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Streaming (Apache Software Foundation, n.d.-b) have been introduced for processing streaming data. Statistical tools for data streams, however, are under-developed and offer only basic functionality. The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis. OnlineStats is a Julia (Bezanson, Edelman, Karpinski, & Shah, 2017) package for high-performance online algorithms. The OnlineStats framework is easily extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization.
format Online
Article
Text
id pubmed-7286575
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72865752020-06-10 OnlineStats.jl: A Julia package for statistics on data streams Day, Josh Zhou, Hua J Open Source Softw Article The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Streaming (Apache Software Foundation, n.d.-b) have been introduced for processing streaming data. Statistical tools for data streams, however, are under-developed and offer only basic functionality. The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis. OnlineStats is a Julia (Bezanson, Edelman, Karpinski, & Shah, 2017) package for high-performance online algorithms. The OnlineStats framework is easily extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization. 2020-02-10 2020 /pmc/articles/PMC7286575/ /pubmed/32524061 http://dx.doi.org/10.21105/joss.01816 Text en Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC-BY (http://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Day, Josh
Zhou, Hua
OnlineStats.jl: A Julia package for statistics on data streams
title OnlineStats.jl: A Julia package for statistics on data streams
title_full OnlineStats.jl: A Julia package for statistics on data streams
title_fullStr OnlineStats.jl: A Julia package for statistics on data streams
title_full_unstemmed OnlineStats.jl: A Julia package for statistics on data streams
title_short OnlineStats.jl: A Julia package for statistics on data streams
title_sort onlinestats.jl: a julia package for statistics on data streams
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7286575/
https://www.ncbi.nlm.nih.gov/pubmed/32524061
http://dx.doi.org/10.21105/joss.01816
work_keys_str_mv AT dayjosh onlinestatsjlajuliapackageforstatisticsondatastreams
AT zhouhua onlinestatsjlajuliapackageforstatisticsondatastreams