Cargando…
Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salie...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/ https://www.ncbi.nlm.nih.gov/pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w |
_version_ | 1783665738661232640 |
---|---|
author | Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. |
author_facet | Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. |
author_sort | Kovács, Dávid Péter |
collection | PubMed |
description | Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models. |
format | Online Article Text |
id | pubmed-7966799 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-79667992021-04-01 Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. Nat Commun Article Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models. Nature Publishing Group UK 2021-03-16 /pmc/articles/PMC7966799/ /pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title | Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title_full | Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title_fullStr | Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title_full_unstemmed | Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title_short | Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
title_sort | quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/ https://www.ncbi.nlm.nih.gov/pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w |
work_keys_str_mv | AT kovacsdavidpeter quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias AT mccorkindalewilliam quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias AT leealphaa quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias |