Cargando…

A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset

Batch process monitoring datasets usually contain missing data, which decreases the performance of data-driven modeling for fault identification and optimal control. Many methods have been proposed to impute missing data; however, they do not fulfill the need for data quality, especially in sensor d...

Descripción completa

Detalles Bibliográficos
Autores principales: Gan, Qihong, Gong, Lang, Hu, Dasha, Jiang, Yuming, Ding, Xuefeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650138/
https://www.ncbi.nlm.nih.gov/pubmed/37960379
http://dx.doi.org/10.3390/s23218678
_version_ 1785135711381356544
author Gan, Qihong
Gong, Lang
Hu, Dasha
Jiang, Yuming
Ding, Xuefeng
author_facet Gan, Qihong
Gong, Lang
Hu, Dasha
Jiang, Yuming
Ding, Xuefeng
author_sort Gan, Qihong
collection PubMed
description Batch process monitoring datasets usually contain missing data, which decreases the performance of data-driven modeling for fault identification and optimal control. Many methods have been proposed to impute missing data; however, they do not fulfill the need for data quality, especially in sensor datasets with different types of missing data. We propose a hybrid missing data imputation method for batch process monitoring datasets with multi-type missing data. In this method, the missing data is first classified into five categories based on the continuous missing duration and the number of variables missing simultaneously. Then, different categories of missing data are step-by-step imputed considering their unique characteristics. A combination of three single-dimensional interpolation models is employed to impute transient isolated missing values. An iterative imputation based on a multivariate regression model is designed for imputing long-term missing variables, and a combination model based on single-dimensional interpolation and multivariate regression is proposed for imputing short-term missing variables. The Long Short-Term Memory (LSTM) model is utilized to impute both short-term and long-term missing samples. Finally, a series of experiments for different categories of missing data were conducted based on a real-world batch process monitoring dataset. The results demonstrate that the proposed method achieves higher imputation accuracy than other comparative methods.
format Online
Article
Text
id pubmed-10650138
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106501382023-10-24 A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset Gan, Qihong Gong, Lang Hu, Dasha Jiang, Yuming Ding, Xuefeng Sensors (Basel) Article Batch process monitoring datasets usually contain missing data, which decreases the performance of data-driven modeling for fault identification and optimal control. Many methods have been proposed to impute missing data; however, they do not fulfill the need for data quality, especially in sensor datasets with different types of missing data. We propose a hybrid missing data imputation method for batch process monitoring datasets with multi-type missing data. In this method, the missing data is first classified into five categories based on the continuous missing duration and the number of variables missing simultaneously. Then, different categories of missing data are step-by-step imputed considering their unique characteristics. A combination of three single-dimensional interpolation models is employed to impute transient isolated missing values. An iterative imputation based on a multivariate regression model is designed for imputing long-term missing variables, and a combination model based on single-dimensional interpolation and multivariate regression is proposed for imputing short-term missing variables. The Long Short-Term Memory (LSTM) model is utilized to impute both short-term and long-term missing samples. Finally, a series of experiments for different categories of missing data were conducted based on a real-world batch process monitoring dataset. The results demonstrate that the proposed method achieves higher imputation accuracy than other comparative methods. MDPI 2023-10-24 /pmc/articles/PMC10650138/ /pubmed/37960379 http://dx.doi.org/10.3390/s23218678 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gan, Qihong
Gong, Lang
Hu, Dasha
Jiang, Yuming
Ding, Xuefeng
A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title_full A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title_fullStr A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title_full_unstemmed A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title_short A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset
title_sort hybrid missing data imputation method for batch process monitoring dataset
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650138/
https://www.ncbi.nlm.nih.gov/pubmed/37960379
http://dx.doi.org/10.3390/s23218678
work_keys_str_mv AT ganqihong ahybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT gonglang ahybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT hudasha ahybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT jiangyuming ahybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT dingxuefeng ahybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT ganqihong hybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT gonglang hybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT hudasha hybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT jiangyuming hybridmissingdataimputationmethodforbatchprocessmonitoringdataset
AT dingxuefeng hybridmissingdataimputationmethodforbatchprocessmonitoringdataset