Cargando…

Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling

[Image: see text] Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated li...

Descripción completa

Detalles Bibliográficos
Autores principales: Beker, Wiktor, Roszak, Rafał, Wołos, Agnieszka, Angello, Nicholas H., Rathore, Vandana, Burke, Martin D., Grzybowski, Bartosz A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949728/
https://www.ncbi.nlm.nih.gov/pubmed/35258973
http://dx.doi.org/10.1021/jacs.1c12005
_version_ 1784674974062084096
author Beker, Wiktor
Roszak, Rafał
Wołos, Agnieszka
Angello, Nicholas H.
Rathore, Vandana
Burke, Martin D.
Grzybowski, Bartosz A.
author_facet Beker, Wiktor
Roszak, Rafał
Wołos, Agnieszka
Angello, Nicholas H.
Rathore, Vandana
Burke, Martin D.
Grzybowski, Bartosz A.
author_sort Beker, Wiktor
collection PubMed
description [Image: see text] Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki–Miyaura coupling with heterocyclic building blocks—and a carefully selected database of >10,000 literature examples—we show that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (various fingerprints, chemical descriptors, latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on the sheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjective preferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents, and/or a lack of negative data. These findings highlight the likely importance of systematically generating reliable and standardized data sets for algorithm training.
format Online
Article
Text
id pubmed-8949728
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-89497282022-03-28 Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling Beker, Wiktor Roszak, Rafał Wołos, Agnieszka Angello, Nicholas H. Rathore, Vandana Burke, Martin D. Grzybowski, Bartosz A. J Am Chem Soc [Image: see text] Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers of literature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paper demonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki–Miyaura coupling with heterocyclic building blocks—and a carefully selected database of >10,000 literature examples—we show that ML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to only solvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (various fingerprints, chemical descriptors, latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on the sheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjective preferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents, and/or a lack of negative data. These findings highlight the likely importance of systematically generating reliable and standardized data sets for algorithm training. American Chemical Society 2022-03-08 2022-03-23 /pmc/articles/PMC8949728/ /pubmed/35258973 http://dx.doi.org/10.1021/jacs.1c12005 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Beker, Wiktor
Roszak, Rafał
Wołos, Agnieszka
Angello, Nicholas H.
Rathore, Vandana
Burke, Martin D.
Grzybowski, Bartosz A.
Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title_full Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title_fullStr Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title_full_unstemmed Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title_short Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling
title_sort machine learning may sometimes simply capture literature popularity trends: a case study of heterocyclic suzuki–miyaura coupling
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8949728/
https://www.ncbi.nlm.nih.gov/pubmed/35258973
http://dx.doi.org/10.1021/jacs.1c12005
work_keys_str_mv AT bekerwiktor machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT roszakrafał machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT wołosagnieszka machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT angellonicholash machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT rathorevandana machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT burkemartind machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling
AT grzybowskibartosza machinelearningmaysometimessimplycaptureliteraturepopularitytrendsacasestudyofheterocyclicsuzukimiyauracoupling