Cargando…

Leveraging change point detection to discover natural experiments in data

Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The meth...

Descripción completa

Detalles Bibliográficos
Autores principales:	He, Yuzi, Burghardt, Keith A., Lerman, Kristina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Berlin Heidelberg 2022
Materias:	Regular Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440658/ https://www.ncbi.nlm.nih.gov/pubmed/36090462 http://dx.doi.org/10.1140/epjds/s13688-022-00361-7

Descripción
Sumario:	Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data.

Leveraging change point detection to discover natural experiments in data

Ejemplares similares