Cargando…
OnlineStats.jl: A Julia package for statistics on data streams
The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Str...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7286575/ https://www.ncbi.nlm.nih.gov/pubmed/32524061 http://dx.doi.org/10.21105/joss.01816 |
_version_ | 1783544901205491712 |
---|---|
author | Day, Josh Zhou, Hua |
author_facet | Day, Josh Zhou, Hua |
author_sort | Day, Josh |
collection | PubMed |
description | The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Streaming (Apache Software Foundation, n.d.-b) have been introduced for processing streaming data. Statistical tools for data streams, however, are under-developed and offer only basic functionality. The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis. OnlineStats is a Julia (Bezanson, Edelman, Karpinski, & Shah, 2017) package for high-performance online algorithms. The OnlineStats framework is easily extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization. |
format | Online Article Text |
id | pubmed-7286575 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72865752020-06-10 OnlineStats.jl: A Julia package for statistics on data streams Day, Josh Zhou, Hua J Open Source Softw Article The growing prevalence of big and streaming data requires a new generation of tools. Data often has infinite size in the sense that new observations are continually arriving daily, hourly, etc. In recent years, several new technologies such as Kafka (Apache Software Foundation, n.d.-a) and Spark Streaming (Apache Software Foundation, n.d.-b) have been introduced for processing streaming data. Statistical tools for data streams, however, are under-developed and offer only basic functionality. The majority of statistical software can only operate on finite batches and require re-loading possibly large datasets for seemingly simple tasks such as incorporating a few more observations into an analysis. OnlineStats is a Julia (Bezanson, Edelman, Karpinski, & Shah, 2017) package for high-performance online algorithms. The OnlineStats framework is easily extensible, includes a large catalog of algorithms, provides primitives for parallel computing, and offers a weighting mechanism that allows new observations have a higher relative influence over the value of the statistic/model/visualization. 2020-02-10 2020 /pmc/articles/PMC7286575/ /pubmed/32524061 http://dx.doi.org/10.21105/joss.01816 Text en Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC-BY (http://creativecommons.org/licenses/by/4.0/) ). |
spellingShingle | Article Day, Josh Zhou, Hua OnlineStats.jl: A Julia package for statistics on data streams |
title | OnlineStats.jl: A Julia package for statistics on data streams |
title_full | OnlineStats.jl: A Julia package for statistics on data streams |
title_fullStr | OnlineStats.jl: A Julia package for statistics on data streams |
title_full_unstemmed | OnlineStats.jl: A Julia package for statistics on data streams |
title_short | OnlineStats.jl: A Julia package for statistics on data streams |
title_sort | onlinestats.jl: a julia package for statistics on data streams |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7286575/ https://www.ncbi.nlm.nih.gov/pubmed/32524061 http://dx.doi.org/10.21105/joss.01816 |
work_keys_str_mv | AT dayjosh onlinestatsjlajuliapackageforstatisticsondatastreams AT zhouhua onlinestatsjlajuliapackageforstatisticsondatastreams |