Cargando…

Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation

[Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify pheno...

Descripción completa

Detalles Bibliográficos
Autor principal: Kaneko, Hiromasa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928558/
https://www.ncbi.nlm.nih.gov/pubmed/35309472
http://dx.doi.org/10.1021/acsomega.1c07379
_version_ 1784670668791480320
author Kaneko, Hiromasa
author_facet Kaneko, Hiromasa
author_sort Kaneko, Hiromasa
collection PubMed
description [Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of X to y due to multicollinearity among X. Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of X to y for the PLSFC model. In addition, selecting the combination of X that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of X to y. The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit.
format Online
Article
Text
id pubmed-8928558
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-89285582022-03-18 Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation Kaneko, Hiromasa ACS Omega [Image: see text] In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables X and objective variables y but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of X to y due to multicollinearity among X. Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of X to y for the PLSFC model. In addition, selecting the combination of X that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of X to y. The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit. American Chemical Society 2022-03-04 /pmc/articles/PMC8928558/ /pubmed/35309472 http://dx.doi.org/10.1021/acsomega.1c07379 Text en © 2022 The Author. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Kaneko, Hiromasa
Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title_full Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title_fullStr Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title_full_unstemmed Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title_short Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation
title_sort genetic algorithm-based partial least-squares with only the first component for model interpretation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928558/
https://www.ncbi.nlm.nih.gov/pubmed/35309472
http://dx.doi.org/10.1021/acsomega.1c07379
work_keys_str_mv AT kanekohiromasa geneticalgorithmbasedpartialleastsquareswithonlythefirstcomponentformodelinterpretation