Cargando…

Global reactivity models are impactful in industrial synthesis applications

Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Neves, Paulo, McClure, Kelly, Verhoeven, Jonas, Dyubankova, Natalia, Nugmanov, Ramil, Gedich, Andrey, Menon, Sairam, Shi, Zhicai, Wegner, Jörg K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921076/
https://www.ncbi.nlm.nih.gov/pubmed/36774523
http://dx.doi.org/10.1186/s13321-023-00685-0
_version_ 1784887225557712896
author Neves, Paulo
McClure, Kelly
Verhoeven, Jonas
Dyubankova, Natalia
Nugmanov, Ramil
Gedich, Andrey
Menon, Sairam
Shi, Zhicai
Wegner, Jörg K.
author_facet Neves, Paulo
McClure, Kelly
Verhoeven, Jonas
Dyubankova, Natalia
Nugmanov, Ramil
Gedich, Andrey
Menon, Sairam
Shi, Zhicai
Wegner, Jörg K.
author_sort Neves, Paulo
collection PubMed
description Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00685-0.
format Online
Article
Text
id pubmed-9921076
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-99210762023-02-12 Global reactivity models are impactful in industrial synthesis applications Neves, Paulo McClure, Kelly Verhoeven, Jonas Dyubankova, Natalia Nugmanov, Ramil Gedich, Andrey Menon, Sairam Shi, Zhicai Wegner, Jörg K. J Cheminform Research Artificial Intelligence is revolutionizing many aspects of the pharmaceutical industry. Deep learning models are now routinely applied to guide drug discovery projects leading to faster and improved findings, but there are still many tasks with enormous unrealized potential. One such task is the reaction yield prediction. Every year more than one fifth of all synthesis attempts result in product yields which are either zero or too low. This equates to chemical and human resources being spent on activities which ultimately do not progress the programs, leading to a triple loss when accounting for the cost of opportunity in time wasted. In this work we pre-train a BERT model on more than 16 million reactions from 4 different data sources, and fine tune it to achieve an uncertainty calibrated global yield prediction model. This model is an improvement upon state of the art not just from the increase in pre-train data but also by introducing a new embedding layer which solves a few limitations of SMILES and enables integration of additional information such as equivalents and molecule role into the reaction encoding, the model is called BERT Enriched Embedding (BEE). The model is benchmarked on an open-source dataset against a state-of-the-art synthesis focused BERT showing a near 20-point improvement in r2 score. The model is fine-tuned and tested on an internal company data benchmark, and a prospective study shows that the application of the model can reduce the total number of negative reactions (yield under 5%) ran in Janssen by at least 34%. Lastly, we corroborate the previous results through experimental validation, by directly deploying the model in an on-going drug discovery project and showing that it can also be used successfully as a reagent recommender due to its fast inference speed and reliable confidence estimation, a critical feature for industry application. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00685-0. Springer International Publishing 2023-02-11 /pmc/articles/PMC9921076/ /pubmed/36774523 http://dx.doi.org/10.1186/s13321-023-00685-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Neves, Paulo
McClure, Kelly
Verhoeven, Jonas
Dyubankova, Natalia
Nugmanov, Ramil
Gedich, Andrey
Menon, Sairam
Shi, Zhicai
Wegner, Jörg K.
Global reactivity models are impactful in industrial synthesis applications
title Global reactivity models are impactful in industrial synthesis applications
title_full Global reactivity models are impactful in industrial synthesis applications
title_fullStr Global reactivity models are impactful in industrial synthesis applications
title_full_unstemmed Global reactivity models are impactful in industrial synthesis applications
title_short Global reactivity models are impactful in industrial synthesis applications
title_sort global reactivity models are impactful in industrial synthesis applications
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9921076/
https://www.ncbi.nlm.nih.gov/pubmed/36774523
http://dx.doi.org/10.1186/s13321-023-00685-0
work_keys_str_mv AT nevespaulo globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT mcclurekelly globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT verhoevenjonas globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT dyubankovanatalia globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT nugmanovramil globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT gedichandrey globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT menonsairam globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT shizhicai globalreactivitymodelsareimpactfulinindustrialsynthesisapplications
AT wegnerjorgk globalreactivitymodelsareimpactfulinindustrialsynthesisapplications