Cargando…

Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data

Fault detection and diagnosis (FDD) has received considerable attention with the advent of big data. Many data-driven FDD procedures have been proposed, but most of them may not be accurate when data missing occurs. Therefore, this paper proposes an improved random forest (RF) based on decision path...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuequn, Luo, Lei, Ji, Xu, Dai, Yiyang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538123/
https://www.ncbi.nlm.nih.gov/pubmed/34695927
http://dx.doi.org/10.3390/s21206715
_version_ 1784588429853458432
author Zhang, Yuequn
Luo, Lei
Ji, Xu
Dai, Yiyang
author_facet Zhang, Yuequn
Luo, Lei
Ji, Xu
Dai, Yiyang
author_sort Zhang, Yuequn
collection PubMed
description Fault detection and diagnosis (FDD) has received considerable attention with the advent of big data. Many data-driven FDD procedures have been proposed, but most of them may not be accurate when data missing occurs. Therefore, this paper proposes an improved random forest (RF) based on decision paths, named DPRF, utilizing correction coefficients to compensate for the influence of incomplete data. In this DPRF model, intact training samples are firstly used to grow all the decision trees in the RF. Then, for each test sample that possibly contains missing values, the decision paths and the corresponding nodes importance scores are obtained, so that for each tree in the RF, the reliability score for the sample can be inferred. Thus, the prediction results of each decision tree for the sample will be assigned to certain reliability scores. The final prediction result is obtained according to the majority voting law, combining both the predicting results and the corresponding reliability scores. To prove the feasibility and effectiveness of the proposed method, the Tennessee Eastman (TE) process is tested. Compared with other FDD methods, the proposed DPRF model shows better performance on incomplete data.
format Online
Article
Text
id pubmed-8538123
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85381232021-10-24 Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data Zhang, Yuequn Luo, Lei Ji, Xu Dai, Yiyang Sensors (Basel) Article Fault detection and diagnosis (FDD) has received considerable attention with the advent of big data. Many data-driven FDD procedures have been proposed, but most of them may not be accurate when data missing occurs. Therefore, this paper proposes an improved random forest (RF) based on decision paths, named DPRF, utilizing correction coefficients to compensate for the influence of incomplete data. In this DPRF model, intact training samples are firstly used to grow all the decision trees in the RF. Then, for each test sample that possibly contains missing values, the decision paths and the corresponding nodes importance scores are obtained, so that for each tree in the RF, the reliability score for the sample can be inferred. Thus, the prediction results of each decision tree for the sample will be assigned to certain reliability scores. The final prediction result is obtained according to the majority voting law, combining both the predicting results and the corresponding reliability scores. To prove the feasibility and effectiveness of the proposed method, the Tennessee Eastman (TE) process is tested. Compared with other FDD methods, the proposed DPRF model shows better performance on incomplete data. MDPI 2021-10-09 /pmc/articles/PMC8538123/ /pubmed/34695927 http://dx.doi.org/10.3390/s21206715 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Yuequn
Luo, Lei
Ji, Xu
Dai, Yiyang
Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title_full Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title_fullStr Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title_full_unstemmed Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title_short Improved Random Forest Algorithm Based on Decision Paths for Fault Diagnosis of Chemical Process with Incomplete Data
title_sort improved random forest algorithm based on decision paths for fault diagnosis of chemical process with incomplete data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8538123/
https://www.ncbi.nlm.nih.gov/pubmed/34695927
http://dx.doi.org/10.3390/s21206715
work_keys_str_mv AT zhangyuequn improvedrandomforestalgorithmbasedondecisionpathsforfaultdiagnosisofchemicalprocesswithincompletedata
AT luolei improvedrandomforestalgorithmbasedondecisionpathsforfaultdiagnosisofchemicalprocesswithincompletedata
AT jixu improvedrandomforestalgorithmbasedondecisionpathsforfaultdiagnosisofchemicalprocesswithincompletedata
AT daiyiyang improvedrandomforestalgorithmbasedondecisionpathsforfaultdiagnosisofchemicalprocesswithincompletedata