Cargando…

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

[Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical ph...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Yiwen, Yalcin, Dilek, Pigram, Paul J., Blackman, Lewis D., Boley, Mario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/
https://www.ncbi.nlm.nih.gov/pubmed/37208794
http://dx.doi.org/10.1021/acs.jcim.3c00460
_version_ 1785059141944868864
author Lu, Yiwen
Yalcin, Dilek
Pigram, Paul J.
Blackman, Lewis D.
Boley, Mario
author_facet Lu, Yiwen
Yalcin, Dilek
Pigram, Paul J.
Blackman, Lewis D.
Boley, Mario
author_sort Lu, Yiwen
collection PubMed
description [Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author.
format Online
Article
Text
id pubmed-10268968
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-102689682023-06-16 Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario J Chem Inf Model [Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author. American Chemical Society 2023-05-19 /pmc/articles/PMC10268968/ /pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Lu, Yiwen
Yalcin, Dilek
Pigram, Paul J.
Blackman, Lewis D.
Boley, Mario
Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_full Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_fullStr Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_full_unstemmed Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_short Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_sort interpretable machine learning models for phase prediction in polymerization-induced self-assembly
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/
https://www.ncbi.nlm.nih.gov/pubmed/37208794
http://dx.doi.org/10.1021/acs.jcim.3c00460
work_keys_str_mv AT luyiwen interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly
AT yalcindilek interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly
AT pigrampaulj interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly
AT blackmanlewisd interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly
AT boleymario interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly