Cargando…
Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach
The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9819684/ https://www.ncbi.nlm.nih.gov/pubmed/36613022 http://dx.doi.org/10.3390/ijerph20010702 |
_version_ | 1784865288472231936 |
---|---|
author | Guo, Ganggui Li, Shanshan Liu, Yakun Cao, Ze Deng, Yangyu |
author_facet | Guo, Ganggui Li, Shanshan Liu, Yakun Cao, Ze Deng, Yangyu |
author_sort | Guo, Ganggui |
collection | PubMed |
description | The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best. |
format | Online Article Text |
id | pubmed-9819684 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98196842023-01-07 Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach Guo, Ganggui Li, Shanshan Liu, Yakun Cao, Ze Deng, Yangyu Int J Environ Res Public Health Article The cavity length, which is a vital index in aeration and corrosion reduction engineering, is affected by many factors and is challenging to calculate. In this study, 10-fold cross-validation was performed to select the optimal input configuration. Additionally, the hyperparameters of three ensemble learning models—random forest (RF), gradient boosting decision tree (GBDT), and extreme gradient boosting tree (XGBOOST)—were fine-tuned by the Bayesian optimization (BO) algorithm to improve the prediction accuracy and compare the five empirical methods. The XGBOOST method was observed to present the highest prediction accuracy. Further interpretability analysis carried out using the Sobol method demonstrated its ability to reasonably capture the varying relative significance of different input features under different flow conditions. The Sobol sensitivity analysis also observed two patterns of extracting information from the input features in ML models: (1) the main effect of individual features in ensemble learning and (2) the interactive effect between each feature in SVR. From the results, the models obtaining individual information both predict the cavity length more accurately than that using interactive information. Subsequently, the XGBOOST captures more correct information from features, which leads to the varied Sobol index in accordance with outside phenomena; meanwhile, the predicted results fit the experimental points best. MDPI 2022-12-30 /pmc/articles/PMC9819684/ /pubmed/36613022 http://dx.doi.org/10.3390/ijerph20010702 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Guo, Ganggui Li, Shanshan Liu, Yakun Cao, Ze Deng, Yangyu Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title | Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title_full | Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title_fullStr | Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title_full_unstemmed | Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title_short | Prediction of Cavity Length Using an Interpretable Ensemble Learning Approach |
title_sort | prediction of cavity length using an interpretable ensemble learning approach |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9819684/ https://www.ncbi.nlm.nih.gov/pubmed/36613022 http://dx.doi.org/10.3390/ijerph20010702 |
work_keys_str_mv | AT guoganggui predictionofcavitylengthusinganinterpretableensemblelearningapproach AT lishanshan predictionofcavitylengthusinganinterpretableensemblelearningapproach AT liuyakun predictionofcavitylengthusinganinterpretableensemblelearningapproach AT caoze predictionofcavitylengthusinganinterpretableensemblelearningapproach AT dengyangyu predictionofcavitylengthusinganinterpretableensemblelearningapproach |