Cargando…

Predicting relative efficiency of amide bond formation using multivariate linear regression

Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal...

Descripción completa

Detalles Bibliográficos
Autores principales: Haas, Brittany C., Goetz, Adam E., Bahamonde, Ana, McWilliams, J. Christopher, Sigman, Matthew S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169781/
https://www.ncbi.nlm.nih.gov/pubmed/35412905
http://dx.doi.org/10.1073/pnas.2118451119
_version_ 1784721272983257088
author Haas, Brittany C.
Goetz, Adam E.
Bahamonde, Ana
McWilliams, J. Christopher
Sigman, Matthew S.
author_facet Haas, Brittany C.
Goetz, Adam E.
Bahamonde, Ana
McWilliams, J. Christopher
Sigman, Matthew S.
author_sort Haas, Brittany C.
collection PubMed
description Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model.
format Online
Article
Text
id pubmed-9169781
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-91697812022-10-11 Predicting relative efficiency of amide bond formation using multivariate linear regression Haas, Brittany C. Goetz, Adam E. Bahamonde, Ana McWilliams, J. Christopher Sigman, Matthew S. Proc Natl Acad Sci U S A Physical Sciences Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model. National Academy of Sciences 2022-04-11 2022-04-19 /pmc/articles/PMC9169781/ /pubmed/35412905 http://dx.doi.org/10.1073/pnas.2118451119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Physical Sciences
Haas, Brittany C.
Goetz, Adam E.
Bahamonde, Ana
McWilliams, J. Christopher
Sigman, Matthew S.
Predicting relative efficiency of amide bond formation using multivariate linear regression
title Predicting relative efficiency of amide bond formation using multivariate linear regression
title_full Predicting relative efficiency of amide bond formation using multivariate linear regression
title_fullStr Predicting relative efficiency of amide bond formation using multivariate linear regression
title_full_unstemmed Predicting relative efficiency of amide bond formation using multivariate linear regression
title_short Predicting relative efficiency of amide bond formation using multivariate linear regression
title_sort predicting relative efficiency of amide bond formation using multivariate linear regression
topic Physical Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169781/
https://www.ncbi.nlm.nih.gov/pubmed/35412905
http://dx.doi.org/10.1073/pnas.2118451119
work_keys_str_mv AT haasbrittanyc predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression
AT goetzadame predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression
AT bahamondeana predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression
AT mcwilliamsjchristopher predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression
AT sigmanmatthews predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression