Cargando…

Estimating Multilevel Models on Data Streams

Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely...

Descripción completa

Detalles Bibliográficos
Autores principales: Ippel, L., Kaptein, M. C., Vermunt, J. K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373343/
https://www.ncbi.nlm.nih.gov/pubmed/30671789
http://dx.doi.org/10.1007/s11336-018-09656-z
_version_ 1783394969403260928
author Ippel, L.
Kaptein, M. C.
Vermunt, J. K.
author_facet Ippel, L.
Kaptein, M. C.
Vermunt, J. K.
author_sort Ippel, L.
collection PubMed
description Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s11336-018-09656-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6373343
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-63733432019-03-01 Estimating Multilevel Models on Data Streams Ippel, L. Kaptein, M. C. Vermunt, J. K. Psychometrika Article Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s11336-018-09656-z) contains supplementary material, which is available to authorized users. Springer US 2019-01-22 2019 /pmc/articles/PMC6373343/ /pubmed/30671789 http://dx.doi.org/10.1007/s11336-018-09656-z Text en © The Author(s) 2019 OpenAccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Ippel, L.
Kaptein, M. C.
Vermunt, J. K.
Estimating Multilevel Models on Data Streams
title Estimating Multilevel Models on Data Streams
title_full Estimating Multilevel Models on Data Streams
title_fullStr Estimating Multilevel Models on Data Streams
title_full_unstemmed Estimating Multilevel Models on Data Streams
title_short Estimating Multilevel Models on Data Streams
title_sort estimating multilevel models on data streams
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6373343/
https://www.ncbi.nlm.nih.gov/pubmed/30671789
http://dx.doi.org/10.1007/s11336-018-09656-z
work_keys_str_mv AT ippell estimatingmultilevelmodelsondatastreams
AT kapteinmc estimatingmultilevelmodelsondatastreams
AT vermuntjk estimatingmultilevelmodelsondatastreams