Cargando…

Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications

Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an importa...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Hoseung, Thiagarajan, Jayaraman J., Kailkhura, Bhavya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223254/
https://www.ncbi.nlm.nih.gov/pubmed/34179767
http://dx.doi.org/10.3389/frai.2021.589632
_version_ 1783711655714095104
author Song, Hoseung
Thiagarajan, Jayaraman J.
Kailkhura, Bhavya
author_facet Song, Hoseung
Thiagarajan, Jayaraman J.
Kailkhura, Bhavya
author_sort Song, Hoseung
collection PubMed
description Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.
format Online
Article
Text
id pubmed-8223254
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-82232542021-06-25 Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications Song, Hoseung Thiagarajan, Jayaraman J. Kailkhura, Bhavya Front Artif Intell Artificial Intelligence Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes. Frontiers Media S.A. 2021-05-18 /pmc/articles/PMC8223254/ /pubmed/34179767 http://dx.doi.org/10.3389/frai.2021.589632 Text en Copyright © 2021 Song, Thiagarajan and Kailkhura. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Song, Hoseung
Thiagarajan, Jayaraman J.
Kailkhura, Bhavya
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_full Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_fullStr Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_full_unstemmed Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_short Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_sort preventing failures by dataset shift detection in safety-critical graph applications
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223254/
https://www.ncbi.nlm.nih.gov/pubmed/34179767
http://dx.doi.org/10.3389/frai.2021.589632
work_keys_str_mv AT songhoseung preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications
AT thiagarajanjayaramanj preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications
AT kailkhurabhavya preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications