Cargando…

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salie...

Descripción completa

Detalles Bibliográficos
Autores principales: Kovács, Dávid Péter, McCorkindale, William, Lee, Alpha A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/
https://www.ncbi.nlm.nih.gov/pubmed/33727552
http://dx.doi.org/10.1038/s41467-021-21895-w
_version_ 1783665738661232640
author Kovács, Dávid Péter
McCorkindale, William
Lee, Alpha A.
author_facet Kovács, Dávid Péter
McCorkindale, William
Lee, Alpha A.
author_sort Kovács, Dávid Péter
collection PubMed
description Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.
format Online
Article
Text
id pubmed-7966799
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-79667992021-04-01 Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. Nat Commun Article Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models. Nature Publishing Group UK 2021-03-16 /pmc/articles/PMC7966799/ /pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Kovács, Dávid Péter
McCorkindale, William
Lee, Alpha A.
Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_full Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_fullStr Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_full_unstemmed Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_short Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_sort quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/
https://www.ncbi.nlm.nih.gov/pubmed/33727552
http://dx.doi.org/10.1038/s41467-021-21895-w
work_keys_str_mv AT kovacsdavidpeter quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias
AT mccorkindalewilliam quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias
AT leealphaa quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias