Cargando…
A data generation framework for extremely rare case signals
Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets fo...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8350195/ https://www.ncbi.nlm.nih.gov/pubmed/34401574 http://dx.doi.org/10.1016/j.heliyon.2021.e07687 |
_version_ | 1783735703525392384 |
---|---|
author | Chalongvorachai, Thasorn Woraratpanya, Kuntpong |
author_facet | Chalongvorachai, Thasorn Woraratpanya, Kuntpong |
author_sort | Chalongvorachai, Thasorn |
collection | PubMed |
description | Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques. |
format | Online Article Text |
id | pubmed-8350195 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-83501952021-08-15 A data generation framework for extremely rare case signals Chalongvorachai, Thasorn Woraratpanya, Kuntpong Heliyon Research Article Unlike data augmentation, data generation for extremely rare cases is an approach that can spawn a significant number of high-quality samples based on very few original data. This could be useful in anomaly detection and classification tasks that have the limitation of publicly available datasets for research purposes. Though some other approaches have attempted to solve this problem, such as data augmentation techniques, there was nothing to ensure the characteristics of synthesized samples. Previously, we initiated a framework, called Data Augmentation and Generation for Anomalous Time-series Signals (DAGAT), that was in cooperation with important components: Data Augmentation, Variational Autoencoder (VAE), Data Picker (DP), Signal Fragment Assembler (SFA), and Quality Classifier (QC). And then, an upgraded framework, called An Advanced Data Generation for Anomalous Signals (ADGAS), was introduced to eliminate the limitations of DAGAT; those are uncontrollable outputs and the possibility of bad data included in a training set. By reforming DAGAT architecture, ADGAS achieves a better outcome of generated samples. Nonetheless, ADGAS could be improved through better SFA, DP, and QC. Hence, this paper proposed a Data Generation Framework for Extremely Rare Case Signals. The proposed framework is achievable in generating reliable data for various objectives. We challenged this framework by using the 1D-CNN to serve as the performance evaluator in multi-class anomalous classifications and using the water treatment and water distribution testbed (SWaT and WADI) as the real-world anomaly datasets. The result shows that it surpasses other baseline methods of anomaly data augmentation and data generation techniques. Elsevier 2021-07-30 /pmc/articles/PMC8350195/ /pubmed/34401574 http://dx.doi.org/10.1016/j.heliyon.2021.e07687 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Chalongvorachai, Thasorn Woraratpanya, Kuntpong A data generation framework for extremely rare case signals |
title | A data generation framework for extremely rare case signals |
title_full | A data generation framework for extremely rare case signals |
title_fullStr | A data generation framework for extremely rare case signals |
title_full_unstemmed | A data generation framework for extremely rare case signals |
title_short | A data generation framework for extremely rare case signals |
title_sort | data generation framework for extremely rare case signals |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8350195/ https://www.ncbi.nlm.nih.gov/pubmed/34401574 http://dx.doi.org/10.1016/j.heliyon.2021.e07687 |
work_keys_str_mv | AT chalongvorachaithasorn adatagenerationframeworkforextremelyrarecasesignals AT woraratpanyakuntpong adatagenerationframeworkforextremelyrarecasesignals AT chalongvorachaithasorn datagenerationframeworkforextremelyrarecasesignals AT woraratpanyakuntpong datagenerationframeworkforextremelyrarecasesignals |