Cargando…

Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions

Machine learning has revolutionized information processing for large datasets across various fields. However, its limited interpretability poses a significant challenge when applied to chemistry. In this study, we developed a set of simple molecular representations to capture the structural informat...

Descripción completa

Detalles Bibliográficos
Autores principales: Chan, Kalok, Ta, Long Thanh, Huang, Yong, Su, Haibin, Lin, Zhenyang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10302643/
https://www.ncbi.nlm.nih.gov/pubmed/37375286
http://dx.doi.org/10.3390/molecules28124730
_version_ 1785065091889102848
author Chan, Kalok
Ta, Long Thanh
Huang, Yong
Su, Haibin
Lin, Zhenyang
author_facet Chan, Kalok
Ta, Long Thanh
Huang, Yong
Su, Haibin
Lin, Zhenyang
author_sort Chan, Kalok
collection PubMed
description Machine learning has revolutionized information processing for large datasets across various fields. However, its limited interpretability poses a significant challenge when applied to chemistry. In this study, we developed a set of simple molecular representations to capture the structural information of ligands in palladium-catalyzed Sonogashira coupling reactions of aryl bromides. Drawing inspiration from human understanding of catalytic cycles, we used a graph neural network to extract structural details of the phosphine ligand, a major contributor to the overall activation energy. We combined these simple molecular representations with an electronic descriptor of aryl bromide as inputs for a fully connected neural network unit. The results allowed us to predict rate constants and gain mechanistic insights into the rate-limiting oxidative addition process using a relatively small dataset. This study highlights the importance of incorporating domain knowledge in machine learning and presents an alternative approach to data analysis.
format Online
Article
Text
id pubmed-10302643
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103026432023-06-29 Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions Chan, Kalok Ta, Long Thanh Huang, Yong Su, Haibin Lin, Zhenyang Molecules Article Machine learning has revolutionized information processing for large datasets across various fields. However, its limited interpretability poses a significant challenge when applied to chemistry. In this study, we developed a set of simple molecular representations to capture the structural information of ligands in palladium-catalyzed Sonogashira coupling reactions of aryl bromides. Drawing inspiration from human understanding of catalytic cycles, we used a graph neural network to extract structural details of the phosphine ligand, a major contributor to the overall activation energy. We combined these simple molecular representations with an electronic descriptor of aryl bromide as inputs for a fully connected neural network unit. The results allowed us to predict rate constants and gain mechanistic insights into the rate-limiting oxidative addition process using a relatively small dataset. This study highlights the importance of incorporating domain knowledge in machine learning and presents an alternative approach to data analysis. MDPI 2023-06-13 /pmc/articles/PMC10302643/ /pubmed/37375286 http://dx.doi.org/10.3390/molecules28124730 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chan, Kalok
Ta, Long Thanh
Huang, Yong
Su, Haibin
Lin, Zhenyang
Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title_full Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title_fullStr Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title_full_unstemmed Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title_short Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions
title_sort incorporating domain knowledge and structure-based descriptors for machine learning: a case study of pd-catalyzed sonogashira reactions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10302643/
https://www.ncbi.nlm.nih.gov/pubmed/37375286
http://dx.doi.org/10.3390/molecules28124730
work_keys_str_mv AT chankalok incorporatingdomainknowledgeandstructurebaseddescriptorsformachinelearningacasestudyofpdcatalyzedsonogashirareactions
AT talongthanh incorporatingdomainknowledgeandstructurebaseddescriptorsformachinelearningacasestudyofpdcatalyzedsonogashirareactions
AT huangyong incorporatingdomainknowledgeandstructurebaseddescriptorsformachinelearningacasestudyofpdcatalyzedsonogashirareactions
AT suhaibin incorporatingdomainknowledgeandstructurebaseddescriptorsformachinelearningacasestudyofpdcatalyzedsonogashirareactions
AT linzhenyang incorporatingdomainknowledgeandstructurebaseddescriptorsformachinelearningacasestudyofpdcatalyzedsonogashirareactions