Cargando…

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salie...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kovács, Dávid Péter, McCorkindale, William, Lee, Alpha A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/ https://www.ncbi.nlm.nih.gov/pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w

_version_	1783665738661232640
author	Kovács, Dávid Péter McCorkindale, William Lee, Alpha A.
author_facet	Kovács, Dávid Péter McCorkindale, William Lee, Alpha A.
author_sort	Kovács, Dávid Péter
collection	PubMed
description	Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.
format	Online Article Text
id	pubmed-7966799
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-79667992021-04-01 Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. Nat Commun Article Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models. Nature Publishing Group UK 2021-03-16 /pmc/articles/PMC7966799/ /pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Kovács, Dávid Péter McCorkindale, William Lee, Alpha A. Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title	Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_full	Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_fullStr	Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_full_unstemmed	Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_short	Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
title_sort	quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7966799/ https://www.ncbi.nlm.nih.gov/pubmed/33727552 http://dx.doi.org/10.1038/s41467-021-21895-w
work_keys_str_mv	AT kovacsdavidpeter quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias AT mccorkindalewilliam quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias AT leealphaa quantitativeinterpretationexplainsmachinelearningmodelsforchemicalreactionpredictionanduncoversbias

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Ejemplares similares