Cargando…
Predicting relative efficiency of amide bond formation using multivariate linear regression
Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169781/ https://www.ncbi.nlm.nih.gov/pubmed/35412905 http://dx.doi.org/10.1073/pnas.2118451119 |
_version_ | 1784721272983257088 |
---|---|
author | Haas, Brittany C. Goetz, Adam E. Bahamonde, Ana McWilliams, J. Christopher Sigman, Matthew S. |
author_facet | Haas, Brittany C. Goetz, Adam E. Bahamonde, Ana McWilliams, J. Christopher Sigman, Matthew S. |
author_sort | Haas, Brittany C. |
collection | PubMed |
description | Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model. |
format | Online Article Text |
id | pubmed-9169781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-91697812022-10-11 Predicting relative efficiency of amide bond formation using multivariate linear regression Haas, Brittany C. Goetz, Adam E. Bahamonde, Ana McWilliams, J. Christopher Sigman, Matthew S. Proc Natl Acad Sci U S A Physical Sciences Amides are ubiquitous in biologically active natural products and commercial drugs. The most common strategy for introducing this functional group is the coupling of a carboxylic acid with an amine, which requires the use of a coupling reagent to facilitate elimination of water. However, the optimal reaction conditions often appear rather arbitrary to the specific reaction. Herein, we report the development of statistical models correlating measured rates to physical organic descriptors to enable the prediction of reaction rates for untested carboxylic acid/amine pairs. The key to the success of this endeavor was the development of an end-to-end data science–based workflow to select a set of coupling partners that are appropriately distributed in chemical space to facilitate statistical model development. By using a parameterization, dimensionality reduction, and clustering protocol, a training set was identified. Reaction rates for a range of carboxylic acid and primary alkyl amine couplings utilizing carbonyldiimidazole (CDI) as the coupling reagent were measured. The collected rates span five orders of magnitude, confirming that the designed training set encompasses a wide range of chemical space necessary for effective model development. Regressing these rates with high-level density functional theory (DFT) descriptors allowed for identification of a statistical model wherein the molecular features of the carboxylic acid are primarily responsible for the observed rates. Finally, out-of-sample amide couplings are used to determine the limitations and effectiveness of the model. National Academy of Sciences 2022-04-11 2022-04-19 /pmc/articles/PMC9169781/ /pubmed/35412905 http://dx.doi.org/10.1073/pnas.2118451119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Physical Sciences Haas, Brittany C. Goetz, Adam E. Bahamonde, Ana McWilliams, J. Christopher Sigman, Matthew S. Predicting relative efficiency of amide bond formation using multivariate linear regression |
title | Predicting relative efficiency of amide bond formation using multivariate linear regression |
title_full | Predicting relative efficiency of amide bond formation using multivariate linear regression |
title_fullStr | Predicting relative efficiency of amide bond formation using multivariate linear regression |
title_full_unstemmed | Predicting relative efficiency of amide bond formation using multivariate linear regression |
title_short | Predicting relative efficiency of amide bond formation using multivariate linear regression |
title_sort | predicting relative efficiency of amide bond formation using multivariate linear regression |
topic | Physical Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9169781/ https://www.ncbi.nlm.nih.gov/pubmed/35412905 http://dx.doi.org/10.1073/pnas.2118451119 |
work_keys_str_mv | AT haasbrittanyc predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression AT goetzadame predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression AT bahamondeana predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression AT mcwilliamsjchristopher predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression AT sigmanmatthews predictingrelativeefficiencyofamidebondformationusingmultivariatelinearregression |