Cargando…
Leveraging change point detection to discover natural experiments in data
Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The meth...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440658/ https://www.ncbi.nlm.nih.gov/pubmed/36090462 http://dx.doi.org/10.1140/epjds/s13688-022-00361-7 |
_version_ | 1784782398978785280 |
---|---|
author | He, Yuzi Burghardt, Keith A. Lerman, Kristina |
author_facet | He, Yuzi Burghardt, Keith A. Lerman, Kristina |
author_sort | He, Yuzi |
collection | PubMed |
description | Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data. |
format | Online Article Text |
id | pubmed-9440658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-94406582022-09-06 Leveraging change point detection to discover natural experiments in data He, Yuzi Burghardt, Keith A. Lerman, Kristina EPJ Data Sci Regular Article Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data. Springer Berlin Heidelberg 2022-09-03 2022 /pmc/articles/PMC9440658/ /pubmed/36090462 http://dx.doi.org/10.1140/epjds/s13688-022-00361-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Regular Article He, Yuzi Burghardt, Keith A. Lerman, Kristina Leveraging change point detection to discover natural experiments in data |
title | Leveraging change point detection to discover natural experiments in data |
title_full | Leveraging change point detection to discover natural experiments in data |
title_fullStr | Leveraging change point detection to discover natural experiments in data |
title_full_unstemmed | Leveraging change point detection to discover natural experiments in data |
title_short | Leveraging change point detection to discover natural experiments in data |
title_sort | leveraging change point detection to discover natural experiments in data |
topic | Regular Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440658/ https://www.ncbi.nlm.nih.gov/pubmed/36090462 http://dx.doi.org/10.1140/epjds/s13688-022-00361-7 |
work_keys_str_mv | AT heyuzi leveragingchangepointdetectiontodiscovernaturalexperimentsindata AT burghardtkeitha leveragingchangepointdetectiontodiscovernaturalexperimentsindata AT lermankristina leveragingchangepointdetectiontodiscovernaturalexperimentsindata |