Cargando…
Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
[Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machin...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878623/ https://www.ncbi.nlm.nih.gov/pubmed/36713747 http://dx.doi.org/10.1021/acsomega.2c06324 |
Sumario: | [Image: see text] The study of superconductors’ critical temperature (T(c)) has been a matter of interest. A method combining a two-layer feature selection (TL) and Optuna-Stacking ensemble learning model is proposed in the study for predicting T(c) from physicochemical components. Since most machine-learning models require a large amount of prior knowledge to construct the feature vectors associated with T(c) manually, they may contain redundant or invalid features that adversely affect the analysis and prediction of T(c). The TL model combines the advantages of filtered and packed feature selection. In the first layer, feature importance is ranked by “SHapley Additive explain (SHAP)” in combination with CatBoost, followed by maximum mutual information coefficient (MIC) and distance correlation coefficient (DCC) for initial feature selection in terms of feature importance ranking. The second layer uses a cross-validation-based genetic algorithm (cv-GA) to eliminate the remaining redundant/invalid features. The selected features are fed into the Stacking integrated learning model to achieve prediction of Tc, and the multidimensional hyperparametric optimization of the metamodel is achieved by Optuna, an improved Bayesian hyperparametric optimization framework based on the Tree-structured Parzen Estimator (TPE) and pruning strategy. The model has obvious advantages and generality in terms of prediction performance and feature reduction rate, and it also proves to be suitable for high-temperature superconductor T(c) prediction. It provides an efficient and cost-effective method for data-driven superconductor research. |
---|