Cargando…

Leveraging change point detection to discover natural experiments in data

Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The meth...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Yuzi, Burghardt, Keith A., Lerman, Kristina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440658/
https://www.ncbi.nlm.nih.gov/pubmed/36090462
http://dx.doi.org/10.1140/epjds/s13688-022-00361-7
_version_ 1784782398978785280
author He, Yuzi
Burghardt, Keith A.
Lerman, Kristina
author_facet He, Yuzi
Burghardt, Keith A.
Lerman, Kristina
author_sort He, Yuzi
collection PubMed
description Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data.
format Online
Article
Text
id pubmed-9440658
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-94406582022-09-06 Leveraging change point detection to discover natural experiments in data He, Yuzi Burghardt, Keith A. Lerman, Kristina EPJ Data Sci Regular Article Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data. Springer Berlin Heidelberg 2022-09-03 2022 /pmc/articles/PMC9440658/ /pubmed/36090462 http://dx.doi.org/10.1140/epjds/s13688-022-00361-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Regular Article
He, Yuzi
Burghardt, Keith A.
Lerman, Kristina
Leveraging change point detection to discover natural experiments in data
title Leveraging change point detection to discover natural experiments in data
title_full Leveraging change point detection to discover natural experiments in data
title_fullStr Leveraging change point detection to discover natural experiments in data
title_full_unstemmed Leveraging change point detection to discover natural experiments in data
title_short Leveraging change point detection to discover natural experiments in data
title_sort leveraging change point detection to discover natural experiments in data
topic Regular Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440658/
https://www.ncbi.nlm.nih.gov/pubmed/36090462
http://dx.doi.org/10.1140/epjds/s13688-022-00361-7
work_keys_str_mv AT heyuzi leveragingchangepointdetectiontodiscovernaturalexperimentsindata
AT burghardtkeitha leveragingchangepointdetectiontodiscovernaturalexperimentsindata
AT lermankristina leveragingchangepointdetectiontodiscovernaturalexperimentsindata