Cargando…

Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model

[Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machin...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Jiahao, Zhao, Yongman, Pan, Rongshun, Zhou, Xue, Wei, Zikai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878623/
https://www.ncbi.nlm.nih.gov/pubmed/36713747
http://dx.doi.org/10.1021/acsomega.2c06324
_version_ 1784878525538369536
author Yu, Jiahao
Zhao, Yongman
Pan, Rongshun
Zhou, Xue
Wei, Zikai
author_facet Yu, Jiahao
Zhao, Yongman
Pan, Rongshun
Zhou, Xue
Wei, Zikai
author_sort Yu, Jiahao
collection PubMed
description [Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with T(c) manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of T(c). The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by “SHapley Additive explain (SHAP)” in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor T(c) prediction. It provides an efficient and cost-effective method for data-driven superconductor research.
format Online
Article
Text
id pubmed-9878623
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-98786232023-01-27 Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model Yu, Jiahao Zhao, Yongman Pan, Rongshun Zhou, Xue Wei, Zikai ACS Omega [Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with T(c) manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of T(c). The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by “SHapley Additive explain (SHAP)” in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor T(c) prediction. It provides an efficient and cost-effective method for data-driven superconductor research. American Chemical Society 2023-01-13 /pmc/articles/PMC9878623/ /pubmed/36713747 http://dx.doi.org/10.1021/acsomega.2c06324 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Yu, Jiahao
Zhao, Yongman
Pan, Rongshun
Zhou, Xue
Wei, Zikai
Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title_full Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title_fullStr Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title_full_unstemmed Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title_short Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
title_sort prediction of the critical temperature of superconductors based on two-layer feature selection and the optuna-stacking ensemble learning model
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878623/
https://www.ncbi.nlm.nih.gov/pubmed/36713747
http://dx.doi.org/10.1021/acsomega.2c06324
work_keys_str_mv AT yujiahao predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel
AT zhaoyongman predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel
AT panrongshun predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel
AT zhouxue predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel
AT weizikai predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel