Cargando…

Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach

The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Ganggui, Li, Shanshan, Liu, Yakun, Cao, Ze, Deng, Yangyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9819684/
https://www.ncbi.nlm.nih.gov/pubmed/36613022
http://dx.doi.org/10.3390/ijerph20010702
_version_ 1784865288472231936
author Guo, Ganggui
Li, Shanshan
Liu, Yakun
Cao, Ze
Deng, Yangyu
author_facet Guo, Ganggui
Li, Shanshan
Liu, Yakun
Cao, Ze
Deng, Yangyu
author_sort Guo, Ganggui
collection PubMed
description The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best.
format Online
Article
Text
id pubmed-9819684
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-98196842023-01-07 Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach Guo, Ganggui Li, Shanshan Liu, Yakun Cao, Ze Deng, Yangyu Int J Environ Res Public Health Article The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best. MDPI 2022-12-30 /pmc/articles/PMC9819684/ /pubmed/36613022 http://dx.doi.org/10.3390/ijerph20010702 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Guo, Ganggui
Li, Shanshan
Liu, Yakun
Cao, Ze
Deng, Yangyu
Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title_full Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title_fullStr Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title_full_unstemmed Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title_short Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
title_sort prediction of cavity length using an interpretable ensemble learning approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9819684/
https://www.ncbi.nlm.nih.gov/pubmed/36613022
http://dx.doi.org/10.3390/ijerph20010702
work_keys_str_mv AT guoganggui predictionofcavitylengthusinganinterpretableensemblelearningapproach
AT lishanshan predictionofcavitylengthusinganinterpretableensemblelearningapproach
AT liuyakun predictionofcavitylengthusinganinterpretableensemblelearningapproach
AT caoze predictionofcavitylengthusinganinterpretableensemblelearningapproach
AT dengyangyu predictionofcavitylengthusinganinterpretableensemblelearningapproach