Cargando…
Global reactivity models are impactful in industrial synthesis applications
Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the rea...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921076/ https://www.ncbi.nlm.nih.gov/pubmed/36774523 http://dx.doi.org/10.1186/s13321-023-00685-0 |
_version_ | 1784887225557712896 |
---|---|
author | Neves, Paulo McClure, Kelly Verhoeven, Jonas Dyubankova, Natalia Nugmanov, Ramil Gedich, Andrey Menon, Sairam Shi, Zhicai Wegner, Jörg K. |
author_facet | Neves, Paulo McClure, Kelly Verhoeven, Jonas Dyubankova, Natalia Nugmanov, Ramil Gedich, Andrey Menon, Sairam Shi, Zhicai Wegner, Jörg K. |
author_sort | Neves, Paulo |
collection | PubMed |
description | Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00685-0. |
format | Online Article Text |
id | pubmed-9921076 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-99210762023-02-12 Global reactivity models are impactful in industrial synthesis applications Neves, Paulo McClure, Kelly Verhoeven, Jonas Dyubankova, Natalia Nugmanov, Ramil Gedich, Andrey Menon, Sairam Shi, Zhicai Wegner, Jörg K. J Cheminform Research Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00685-0. Springer International Publishing 2023-02-11 /pmc/articles/PMC9921076/ /pubmed/36774523 http://dx.doi.org/10.1186/s13321-023-00685-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Neves, Paulo McClure, Kelly Verhoeven, Jonas Dyubankova, Natalia Nugmanov, Ramil Gedich, Andrey Menon, Sairam Shi, Zhicai Wegner, Jörg K. Global reactivity models are impactful in industrial synthesis applications |
title | Global reactivity models are impactful in industrial synthesis applications |
title_full | Global reactivity models are impactful in industrial synthesis applications |
title_fullStr | Global reactivity models are impactful in industrial synthesis applications |
title_full_unstemmed | Global reactivity models are impactful in industrial synthesis applications |
title_short | Global reactivity models are impactful in industrial synthesis applications |
title_sort | global reactivity models are impactful in industrial synthesis applications |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921076/ https://www.ncbi.nlm.nih.gov/pubmed/36774523 http://dx.doi.org/10.1186/s13321-023-00685-0 |
work_keys_str_mv | AT nevespaulo globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT mcclurekelly globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT verhoevenjonas globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT dyubankovanatalia globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT nugmanovramil globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT gedichandrey globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT menonsairam globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT shizhicai globalreactivitymodelsareimpactfulinindustrialsynthesisapplications AT wegnerjorgk globalreactivitymodelsareimpactfulinindustrialsynthesisapplications |