Cargando…
Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
[Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machin...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878623/ https://www.ncbi.nlm.nih.gov/pubmed/36713747 http://dx.doi.org/10.1021/acsomega.2c06324 |
_version_ | 1784878525538369536 |
---|---|
author | Yu, Jiahao Zhao, Yongman Pan, Rongshun Zhou, Xue Wei, Zikai |
author_facet | Yu, Jiahao Zhao, Yongman Pan, Rongshun Zhou, Xue Wei, Zikai |
author_sort | Yu, Jiahao |
collection | PubMed |
description | [Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with T(c) manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of T(c). The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by “SHapley Additive explain (SHAP)” in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor T(c) prediction. It provides an efficient and cost-effective method for data-driven superconductor research. |
format | Online Article Text |
id | pubmed-9878623 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-98786232023-01-27 Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model Yu, Jiahao Zhao, Yongman Pan, Rongshun Zhou, Xue Wei, Zikai ACS Omega [Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with T(c) manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of T(c). The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by “SHapley Additive explain (SHAP)” in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor T(c) prediction. It provides an efficient and cost-effective method for data-driven superconductor research. American Chemical Society 2023-01-13 /pmc/articles/PMC9878623/ /pubmed/36713747 http://dx.doi.org/10.1021/acsomega.2c06324 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Yu, Jiahao Zhao, Yongman Pan, Rongshun Zhou, Xue Wei, Zikai Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model |
title | Prediction of the Critical Temperature of Superconductors
Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble
Learning Model |
title_full | Prediction of the Critical Temperature of Superconductors
Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble
Learning Model |
title_fullStr | Prediction of the Critical Temperature of Superconductors
Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble
Learning Model |
title_full_unstemmed | Prediction of the Critical Temperature of Superconductors
Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble
Learning Model |
title_short | Prediction of the Critical Temperature of Superconductors
Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble
Learning Model |
title_sort | prediction of the critical temperature of superconductors
based on two-layer feature selection and the optuna-stacking ensemble
learning model |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878623/ https://www.ncbi.nlm.nih.gov/pubmed/36713747 http://dx.doi.org/10.1021/acsomega.2c06324 |
work_keys_str_mv | AT yujiahao predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel AT zhaoyongman predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel AT panrongshun predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel AT zhouxue predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel AT weizikai predictionofthecriticaltemperatureofsuperconductorsbasedontwolayerfeatureselectionandtheoptunastackingensemblelearningmodel |