Cargando…
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an importa...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223254/ https://www.ncbi.nlm.nih.gov/pubmed/34179767 http://dx.doi.org/10.3389/frai.2021.589632 |
_version_ | 1783711655714095104 |
---|---|
author | Song, Hoseung Thiagarajan, Jayaraman J. Kailkhura, Bhavya |
author_facet | Song, Hoseung Thiagarajan, Jayaraman J. Kailkhura, Bhavya |
author_sort | Song, Hoseung |
collection | PubMed |
description | Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes. |
format | Online Article Text |
id | pubmed-8223254 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-82232542021-06-25 Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications Song, Hoseung Thiagarajan, Jayaraman J. Kailkhura, Bhavya Front Artif Intell Artificial Intelligence Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes. Frontiers Media S.A. 2021-05-18 /pmc/articles/PMC8223254/ /pubmed/34179767 http://dx.doi.org/10.3389/frai.2021.589632 Text en Copyright © 2021 Song, Thiagarajan and Kailkhura. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Artificial Intelligence Song, Hoseung Thiagarajan, Jayaraman J. Kailkhura, Bhavya Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title | Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_full | Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_fullStr | Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_full_unstemmed | Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_short | Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_sort | preventing failures by dataset shift detection in safety-critical graph applications |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223254/ https://www.ncbi.nlm.nih.gov/pubmed/34179767 http://dx.doi.org/10.3389/frai.2021.589632 |
work_keys_str_mv | AT songhoseung preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications AT thiagarajanjayaramanj preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications AT kailkhurabhavya preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications |