Cargando…
Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
[Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profil...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/ https://www.ncbi.nlm.nih.gov/pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218 |
_version_ | 1785069070276624384 |
---|---|
author | Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. |
author_facet | Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. |
author_sort | Einarson, Kasper A. |
collection | PubMed |
description | [Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs. |
format | Online Article Text |
id | pubmed-10324072 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-103240722023-07-07 Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. ACS Omega [Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs. American Chemical Society 2023-06-22 /pmc/articles/PMC10324072/ /pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs |
title | Molecular Representations in Machine-Learning-Based
Prediction of PK Parameters for Insulin Analogs |
title_full | Molecular Representations in Machine-Learning-Based
Prediction of PK Parameters for Insulin Analogs |
title_fullStr | Molecular Representations in Machine-Learning-Based
Prediction of PK Parameters for Insulin Analogs |
title_full_unstemmed | Molecular Representations in Machine-Learning-Based
Prediction of PK Parameters for Insulin Analogs |
title_short | Molecular Representations in Machine-Learning-Based
Prediction of PK Parameters for Insulin Analogs |
title_sort | molecular representations in machine-learning-based
prediction of pk parameters for insulin analogs |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/ https://www.ncbi.nlm.nih.gov/pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218 |
work_keys_str_mv | AT einarsonkaspera molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT bendtsenkristianm molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT likang molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT thomsenmaria molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT kristensennielsr molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT wintherole molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT fullesimone molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT clemmensenline molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT refsgaardhannehf molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs |