Cargando…

Prefix Imputation of Orphan Events in Event Stream Processing

In the context of process mining, event logs consist of process instances called cases. Conformance checking is a process mining task that inspects whether a log file is conformant with an existing process model. This inspection is additionally quantifying the conformance in an explainable manner. O...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zaman, Rashid, Hassani, Marwan, Van Dongen, Boudewijn F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528154/ https://www.ncbi.nlm.nih.gov/pubmed/34693281 http://dx.doi.org/10.3389/fdata.2021.705243

_version_	1784586201828687872
author	Zaman, Rashid Hassani, Marwan Van Dongen, Boudewijn F.
author_facet	Zaman, Rashid Hassani, Marwan Van Dongen, Boudewijn F.
author_sort	Zaman, Rashid
collection	PubMed
description	In the context of process mining, event logs consist of process instances called cases. Conformance checking is a process mining task that inspects whether a log file is conformant with an existing process model. This inspection is additionally quantifying the conformance in an explainable manner. Online conformance checking processes streaming event logs by having precise insights into the running cases and timely mitigating non-conformance, if any. State-of-the-art online conformance checking approaches bound the memory by either delimiting storage of the events per case or limiting the number of cases to a specific window width. The former technique still requires unbounded memory as the number of cases to store is unlimited, while the latter technique forgets running, not yet concluded, cases to conform to the limited window width. Consequently, the processing system may later encounter events that represent some intermediate activity as per the process model and for which the relevant case has been forgotten, to be referred to as orphan events. The naïve approach to cope with an orphan event is to either neglect its relevant case for conformance checking or treat it as an altogether new case. However, this might result in misleading process insights, for instance, overestimated non-conformance. In order to bound memory yet effectively incorporate the orphan events into processing, we propose an imputation of missing-prefix approach for such orphan events. Our approach utilizes the existing process model for imputing the missing prefix. Furthermore, we leverage the case storage management to increase the accuracy of the prefix prediction. We propose a systematic forgetting mechanism that distinguishes and forgets the cases that can be reliably regenerated as prefix upon receipt of their future orphan event. We evaluate the efficacy of our proposed approach through multiple experiments with synthetic and three real event logs while simulating a streaming setting. Our approach achieves considerably higher realistic conformance statistics than the state of the art while requiring the same storage.
format	Online Article Text
id	pubmed-8528154
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-85281542021-10-21 Prefix Imputation of Orphan Events in Event Stream Processing Zaman, Rashid Hassani, Marwan Van Dongen, Boudewijn F. Front Big Data Big Data In the context of process mining, event logs consist of process instances called cases. Conformance checking is a process mining task that inspects whether a log file is conformant with an existing process model. This inspection is additionally quantifying the conformance in an explainable manner. Online conformance checking processes streaming event logs by having precise insights into the running cases and timely mitigating non-conformance, if any. State-of-the-art online conformance checking approaches bound the memory by either delimiting storage of the events per case or limiting the number of cases to a specific window width. The former technique still requires unbounded memory as the number of cases to store is unlimited, while the latter technique forgets running, not yet concluded, cases to conform to the limited window width. Consequently, the processing system may later encounter events that represent some intermediate activity as per the process model and for which the relevant case has been forgotten, to be referred to as orphan events. The naïve approach to cope with an orphan event is to either neglect its relevant case for conformance checking or treat it as an altogether new case. However, this might result in misleading process insights, for instance, overestimated non-conformance. In order to bound memory yet effectively incorporate the orphan events into processing, we propose an imputation of missing-prefix approach for such orphan events. Our approach utilizes the existing process model for imputing the missing prefix. Furthermore, we leverage the case storage management to increase the accuracy of the prefix prediction. We propose a systematic forgetting mechanism that distinguishes and forgets the cases that can be reliably regenerated as prefix upon receipt of their future orphan event. We evaluate the efficacy of our proposed approach through multiple experiments with synthetic and three real event logs while simulating a streaming setting. Our approach achieves considerably higher realistic conformance statistics than the state of the art while requiring the same storage. Frontiers Media S.A. 2021-10-06 /pmc/articles/PMC8528154/ /pubmed/34693281 http://dx.doi.org/10.3389/fdata.2021.705243 Text en Copyright © 2021 Zaman, Hassani and Van Dongen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Zaman, Rashid Hassani, Marwan Van Dongen, Boudewijn F. Prefix Imputation of Orphan Events in Event Stream Processing
title	Prefix Imputation of Orphan Events in Event Stream Processing
title_full	Prefix Imputation of Orphan Events in Event Stream Processing
title_fullStr	Prefix Imputation of Orphan Events in Event Stream Processing
title_full_unstemmed	Prefix Imputation of Orphan Events in Event Stream Processing
title_short	Prefix Imputation of Orphan Events in Event Stream Processing
title_sort	prefix imputation of orphan events in event stream processing
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8528154/ https://www.ncbi.nlm.nih.gov/pubmed/34693281 http://dx.doi.org/10.3389/fdata.2021.705243
work_keys_str_mv	AT zamanrashid prefiximputationoforphaneventsineventstreamprocessing AT hassanimarwan prefiximputationoforphaneventsineventstreamprocessing AT vandongenboudewijnf prefiximputationoforphaneventsineventstreamprocessing

Prefix Imputation of Orphan Events in Event Stream Processing

Ejemplares similares