Cargando…
Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
[Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify pheno...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928558/ https://www.ncbi.nlm.nih.gov/pubmed/35309472 http://dx.doi.org/10.1021/acsomega.1c07379 |
_version_ | 1784670668791480320 |
---|---|
author | Kaneko, Hiromasa |
author_facet | Kaneko, Hiromasa |
author_sort | Kaneko, Hiromasa |
collection | PubMed |
description | [Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of X to y due to multicollinearity among X. Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of X to y for the PLSFC model. In addition, selecting the combination of X that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of X to y. The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit. |
format | Online Article Text |
id | pubmed-8928558 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-89285582022-03-18 Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation Kaneko, Hiromasa ACS Omega [Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of X to y due to multicollinearity among X. Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of X to y for the PLSFC model. In addition, selecting the combination of X that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of X to y. The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit. American Chemical Society 2022-03-04 /pmc/articles/PMC8928558/ /pubmed/35309472 http://dx.doi.org/10.1021/acsomega.1c07379 Text en © 2022 The Author. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Kaneko, Hiromasa Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation |
title | Genetic Algorithm-Based Partial Least-Squares with
Only the First Component for Model Interpretation |
title_full | Genetic Algorithm-Based Partial Least-Squares with
Only the First Component for Model Interpretation |
title_fullStr | Genetic Algorithm-Based Partial Least-Squares with
Only the First Component for Model Interpretation |
title_full_unstemmed | Genetic Algorithm-Based Partial Least-Squares with
Only the First Component for Model Interpretation |
title_short | Genetic Algorithm-Based Partial Least-Squares with
Only the First Component for Model Interpretation |
title_sort | genetic algorithm-based partial least-squares with
only the first component for model interpretation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928558/ https://www.ncbi.nlm.nih.gov/pubmed/35309472 http://dx.doi.org/10.1021/acsomega.1c07379 |
work_keys_str_mv | AT kanekohiromasa geneticalgorithmbasedpartialleastsquareswithonlythefirstcomponentformodelinterpretation |