Cargando…
On the use of real-world datasets for reaction yield prediction
The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly avail...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society of Chemistry
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189898/ https://www.ncbi.nlm.nih.gov/pubmed/37206399 http://dx.doi.org/10.1039/d2sc06041h |
_version_ | 1785043180930990080 |
---|---|
author | Saebi, Mandana Nan, Bozhao Herr, John E. Wahlers, Jessica Guo, Zhichun Zurański, Andrzej M. Kogej, Thierry Norrby, Per-Ola Doyle, Abigail G. Chawla, Nitesh V. Wiest, Olaf |
author_facet | Saebi, Mandana Nan, Bozhao Herr, John E. Wahlers, Jessica Guo, Zhichun Zurański, Andrzej M. Kogej, Thierry Norrby, Per-Ola Doyle, Abigail G. Chawla, Nitesh V. Wiest, Olaf |
author_sort | Saebi, Mandana |
collection | PubMed |
description | The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki–Miyaura and Buchwald–Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions. |
format | Online Article Text |
id | pubmed-10189898 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | The Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-101898982023-05-18 On the use of real-world datasets for reaction yield prediction Saebi, Mandana Nan, Bozhao Herr, John E. Wahlers, Jessica Guo, Zhichun Zurański, Andrzej M. Kogej, Thierry Norrby, Per-Ola Doyle, Abigail G. Chawla, Nitesh V. Wiest, Olaf Chem Sci Chemistry The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki–Miyaura and Buchwald–Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions. The Royal Society of Chemistry 2023-03-13 /pmc/articles/PMC10189898/ /pubmed/37206399 http://dx.doi.org/10.1039/d2sc06041h Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/ |
spellingShingle | Chemistry Saebi, Mandana Nan, Bozhao Herr, John E. Wahlers, Jessica Guo, Zhichun Zurański, Andrzej M. Kogej, Thierry Norrby, Per-Ola Doyle, Abigail G. Chawla, Nitesh V. Wiest, Olaf On the use of real-world datasets for reaction yield prediction |
title | On the use of real-world datasets for reaction yield prediction |
title_full | On the use of real-world datasets for reaction yield prediction |
title_fullStr | On the use of real-world datasets for reaction yield prediction |
title_full_unstemmed | On the use of real-world datasets for reaction yield prediction |
title_short | On the use of real-world datasets for reaction yield prediction |
title_sort | on the use of real-world datasets for reaction yield prediction |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10189898/ https://www.ncbi.nlm.nih.gov/pubmed/37206399 http://dx.doi.org/10.1039/d2sc06041h |
work_keys_str_mv | AT saebimandana ontheuseofrealworlddatasetsforreactionyieldprediction AT nanbozhao ontheuseofrealworlddatasetsforreactionyieldprediction AT herrjohne ontheuseofrealworlddatasetsforreactionyieldprediction AT wahlersjessica ontheuseofrealworlddatasetsforreactionyieldprediction AT guozhichun ontheuseofrealworlddatasetsforreactionyieldprediction AT zuranskiandrzejm ontheuseofrealworlddatasetsforreactionyieldprediction AT kogejthierry ontheuseofrealworlddatasetsforreactionyieldprediction AT norrbyperola ontheuseofrealworlddatasetsforreactionyieldprediction AT doyleabigailg ontheuseofrealworlddatasetsforreactionyieldprediction AT chawlaniteshv ontheuseofrealworlddatasetsforreactionyieldprediction AT wiestolaf ontheuseofrealworlddatasetsforreactionyieldprediction |