Cargando…

A data generation framework for extremely rare case signals

Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Chalongvorachai, Thasorn, Woraratpanya, Kuntpong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8350195/
https://www.ncbi.nlm.nih.gov/pubmed/34401574
http://dx.doi.org/10.1016/j.heliyon.2021.e07687
_version_ 1783735703525392384
author Chalongvorachai, Thasorn
Woraratpanya, Kuntpong
author_facet Chalongvorachai, Thasorn
Woraratpanya, Kuntpong
author_sort Chalongvorachai, Thasorn
collection PubMed
description Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques.
format Online
Article
Text
id pubmed-8350195
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-83501952021-08-15 A data generation framework for extremely rare case signals Chalongvorachai, Thasorn Woraratpanya, Kuntpong Heliyon Research Article Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques. Elsevier 2021-07-30 /pmc/articles/PMC8350195/ /pubmed/34401574 http://dx.doi.org/10.1016/j.heliyon.2021.e07687 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Chalongvorachai, Thasorn
Woraratpanya, Kuntpong
A data generation framework for extremely rare case signals
title A data generation framework for extremely rare case signals
title_full A data generation framework for extremely rare case signals
title_fullStr A data generation framework for extremely rare case signals
title_full_unstemmed A data generation framework for extremely rare case signals
title_short A data generation framework for extremely rare case signals
title_sort data generation framework for extremely rare case signals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8350195/
https://www.ncbi.nlm.nih.gov/pubmed/34401574
http://dx.doi.org/10.1016/j.heliyon.2021.e07687
work_keys_str_mv AT chalongvorachaithasorn adatagenerationframeworkforextremelyrarecasesignals
AT woraratpanyakuntpong adatagenerationframeworkforextremelyrarecasesignals
AT chalongvorachaithasorn datagenerationframeworkforextremelyrarecasesignals
AT woraratpanyakuntpong datagenerationframeworkforextremelyrarecasesignals