Cargando…

Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs

[Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profil...

Descripción completa

Detalles Bibliográficos
Autores principales: Einarson, Kasper A., Bendtsen, Kristian M., Li, Kang, Thomsen, Maria, Kristensen, Niels R., Winther, Ole, Fulle, Simone, Clemmensen, Line, Refsgaard, Hanne H.F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/
https://www.ncbi.nlm.nih.gov/pubmed/37426277
http://dx.doi.org/10.1021/acsomega.3c01218
_version_ 1785069070276624384
author Einarson, Kasper A.
Bendtsen, Kristian M.
Li, Kang
Thomsen, Maria
Kristensen, Niels R.
Winther, Ole
Fulle, Simone
Clemmensen, Line
Refsgaard, Hanne H.F.
author_facet Einarson, Kasper A.
Bendtsen, Kristian M.
Li, Kang
Thomsen, Maria
Kristensen, Niels R.
Winther, Ole
Fulle, Simone
Clemmensen, Line
Refsgaard, Hanne H.F.
author_sort Einarson, Kasper A.
collection PubMed
description [Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs.
format Online
Article
Text
id pubmed-10324072
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-103240722023-07-07 Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. ACS Omega [Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs. American Chemical Society 2023-06-22 /pmc/articles/PMC10324072/ /pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Einarson, Kasper A.
Bendtsen, Kristian M.
Li, Kang
Thomsen, Maria
Kristensen, Niels R.
Winther, Ole
Fulle, Simone
Clemmensen, Line
Refsgaard, Hanne H.F.
Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_full Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_fullStr Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_full_unstemmed Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_short Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_sort molecular representations in machine-learning-based prediction of pk parameters for insulin analogs
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/
https://www.ncbi.nlm.nih.gov/pubmed/37426277
http://dx.doi.org/10.1021/acsomega.3c01218
work_keys_str_mv AT einarsonkaspera molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT bendtsenkristianm molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT likang molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT thomsenmaria molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT kristensennielsr molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT wintherole molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT fullesimone molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT clemmensenline molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs
AT refsgaardhannehf molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs