Cargando…

LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap

[Image: see text] Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the E(gap) values of 4209 polymers, which yielded a test set R(2) score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio o...

Descripción completa

Detalles Bibliográficos
Autores principales: Goh, Kai Leong, Goto, Atsushi, Lu, Yunpeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9434625/
https://www.ncbi.nlm.nih.gov/pubmed/36061712
http://dx.doi.org/10.1021/acsomega.2c02554
_version_ 1784780917123842048
author Goh, Kai Leong
Goto, Atsushi
Lu, Yunpeng
author_facet Goh, Kai Leong
Goto, Atsushi
Lu, Yunpeng
author_sort Goh, Kai Leong
collection PubMed
description [Image: see text] Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the E(gap) values of 4209 polymers, which yielded a test set R(2) score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In this paper, we present a new QSPR model named LGB-Stack, which performs a two-level stacked generalization using the light gradient boosting machine. At level 1, multiple weak models are trained, and at level 2, they are combined into a strong final model. Four molecular fingerprints were generated from the simplified molecular input line entry system notations of the polymers. They were trimmed using recursive feature elimination and used as the initial input features for training the weak models. The output predictions of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process. Our results show that the best test set R(2) and the RMSE scores of LGB-Stack at the train/test split ratio of 80/20 were 0.92 and 0.41, respectively. The accuracy scores further improved to 0.94 and 0.34, respectively, when the train/test split ratio of 95/5 was used.
format Online
Article
Text
id pubmed-9434625
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-94346252022-09-02 LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap Goh, Kai Leong Goto, Atsushi Lu, Yunpeng ACS Omega [Image: see text] Recently, the Ramprasad group reported a quantitative structure–property relationship (QSPR) model for predicting the E(gap) values of 4209 polymers, which yielded a test set R(2) score of 0.90 and a test set root-mean-square error (RMSE) score of 0.44 at a train/test split ratio of 80/20. In this paper, we present a new QSPR model named LGB-Stack, which performs a two-level stacked generalization using the light gradient boosting machine. At level 1, multiple weak models are trained, and at level 2, they are combined into a strong final model. Four molecular fingerprints were generated from the simplified molecular input line entry system notations of the polymers. They were trimmed using recursive feature elimination and used as the initial input features for training the weak models. The output predictions of the weak models were used as the new input features for training the final model, which completes the LGB-Stack model training process. Our results show that the best test set R(2) and the RMSE scores of LGB-Stack at the train/test split ratio of 80/20 were 0.92 and 0.41, respectively. The accuracy scores further improved to 0.94 and 0.34, respectively, when the train/test split ratio of 95/5 was used. American Chemical Society 2022-08-15 /pmc/articles/PMC9434625/ /pubmed/36061712 http://dx.doi.org/10.1021/acsomega.2c02554 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Goh, Kai Leong
Goto, Atsushi
Lu, Yunpeng
LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title_full LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title_fullStr LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title_full_unstemmed LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title_short LGB-Stack: Stacked Generalization with LightGBM for Highly Accurate Predictions of Polymer Bandgap
title_sort lgb-stack: stacked generalization with lightgbm for highly accurate predictions of polymer bandgap
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9434625/
https://www.ncbi.nlm.nih.gov/pubmed/36061712
http://dx.doi.org/10.1021/acsomega.2c02554
work_keys_str_mv AT gohkaileong lgbstackstackedgeneralizationwithlightgbmforhighlyaccuratepredictionsofpolymerbandgap
AT gotoatsushi lgbstackstackedgeneralizationwithlightgbmforhighlyaccuratepredictionsofpolymerbandgap
AT luyunpeng lgbstackstackedgeneralizationwithlightgbmforhighlyaccuratepredictionsofpolymerbandgap