Cargando…

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

[Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical ph...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lu, Yiwen, Yalcin, Dilek, Pigram, Paul J., Blackman, Lewis D., Boley, Mario
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/ https://www.ncbi.nlm.nih.gov/pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460

_version_	1785059141944868864
author	Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario
author_facet	Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario
author_sort	Lu, Yiwen
collection	PubMed
description	[Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author.
format	Online Article Text
id	pubmed-10268968
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-102689682023-06-16 Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario J Chem Inf Model [Image: see text] While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5–16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author. American Chemical Society 2023-05-19 /pmc/articles/PMC10268968/ /pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Lu, Yiwen Yalcin, Dilek Pigram, Paul J. Blackman, Lewis D. Boley, Mario Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title	Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_full	Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_fullStr	Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_full_unstemmed	Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_short	Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
title_sort	interpretable machine learning models for phase prediction in polymerization-induced self-assembly
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10268968/ https://www.ncbi.nlm.nih.gov/pubmed/37208794 http://dx.doi.org/10.1021/acs.jcim.3c00460
work_keys_str_mv	AT luyiwen interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT yalcindilek interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT pigrampaulj interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT blackmanlewisd interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly AT boleymario interpretablemachinelearningmodelsforphasepredictioninpolymerizationinducedselfassembly

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly

Ejemplares similares