Cargando…
Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
[Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical ph...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/ https://www.ncbi.nlm.nih.gov/pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460 |
_version_ | 1785059141944868864 |
---|---|
author | Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario |
author_facet | Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario |
author_sort | Lu, Yiwen |
collection | PubMed |
description | [Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author. |
format | Online Article Text |
id | pubmed-10268968 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-102689682023-06-16 Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario J Chem Inf Model [Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author. American Chemical Society 2023-05-19 /pmc/articles/PMC10268968/ /pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly |
title | Interpretable Machine
Learning Models for Phase Prediction
in Polymerization-Induced Self-Assembly |
title_full | Interpretable Machine
Learning Models for Phase Prediction
in Polymerization-Induced Self-Assembly |
title_fullStr | Interpretable Machine
Learning Models for Phase Prediction
in Polymerization-Induced Self-Assembly |
title_full_unstemmed | Interpretable Machine
Learning Models for Phase Prediction
in Polymerization-Induced Self-Assembly |
title_short | Interpretable Machine
Learning Models for Phase Prediction
in Polymerization-Induced Self-Assembly |
title_sort | interpretable machine
learning models for phase prediction
in polymerization-induced self-assembly |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/ https://www.ncbi.nlm.nih.gov/pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460 |
work_keys_str_mv | AT luyiwen interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT yalcindilek interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT pigrampaulj interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT blackmanlewisd interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT boleymario interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly |