Cargando…

Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs

[Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profil...

Descripción completa

Detalles Bibliográficos
Autores principales:	Einarson, Kasper A., Bendtsen, Kristian M., Li, Kang, Thomsen, Maria, Kristensen, Niels R., Winther, Ole, Fulle, Simone, Clemmensen, Line, Refsgaard, Hanne H.F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/ https://www.ncbi.nlm.nih.gov/pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218

_version_	1785069070276624384
author	Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F.
author_facet	Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F.
author_sort	Einarson, Kasper A.
collection	PubMed
description	[Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs.
format	Online Article Text
id	pubmed-10324072
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-103240722023-07-07 Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. ACS Omega [Image: see text] Therapeutic peptides and proteins derived from either endogenous hormones, such as insulin, or de novo design via display technologies occupy a distinct pharmaceutical space in between small molecules and large proteins such as antibodies. Optimizing the pharmacokinetic (PK) profile of drug candidates is of high importance when it comes to prioritizing lead candidates, and machine-learning models can provide a relevant tool to accelerate the drug design process. Predicting PK parameters of proteins remains difficult due to the complex factors that influence PK properties; furthermore, the data sets are small compared to the variety of compounds in the protein space. This study describes a novel combination of molecular descriptors for proteins such as insulin analogs, where many contained chemical modifications, e.g., attached small molecules for protraction of the half-life. The underlying data set consisted of 640 structural diverse insulin analogs, of which around half had attached small molecules. Other analogs were conjugated to peptides, amino acid extensions, or fragment crystallizable regions. The PK parameters clearance (CL), half-life (T1/2), and mean residence time (MRT) could be predicted by using classical machine-learning models such as Random Forest (RF) and Artificial Neural Networks (ANN) with root-mean-square errors of CL of 0.60 and 0.68 (log units) and average fold errors of 2.5 and 2.9 for RF and ANN, respectively. Both random and temporal data splittings were employed to evaluate ideal and prospective model performance with the best models, regardless of data splitting, achieving a minimum of 70% of predictions within a twofold error. The tested molecular representations include (1) global physiochemical descriptors combined with descriptors encoding the amino acid composition of the insulin analogs, (2) physiochemical descriptors of the attached small molecule, (3) protein language model (evolutionary scale modeling) embedding of the amino acid sequence of the molecules, and (4) a natural language processing inspired embedding (mol2vec) of the attached small molecule. Encoding the attached small molecule via (2) or (4) significantly improved the predictions, while the benefit of using the protein language model-based encoding (3) depended on the used machine-learning model. The most important molecular descriptors were identified as descriptors related to the molecular size of both the protein and protraction part using Shapley additive explanations values. Overall, the results show that combining representations of proteins and small molecules was key for PK predictions of insulin analogs. American Chemical Society 2023-06-22 /pmc/articles/PMC10324072/ /pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Einarson, Kasper A. Bendtsen, Kristian M. Li, Kang Thomsen, Maria Kristensen, Niels R. Winther, Ole Fulle, Simone Clemmensen, Line Refsgaard, Hanne H.F. Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title	Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_full	Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_fullStr	Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_full_unstemmed	Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_short	Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs
title_sort	molecular representations in machine-learning-based prediction of pk parameters for insulin analogs
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10324072/ https://www.ncbi.nlm.nih.gov/pubmed/37426277 http://dx.doi.org/10.1021/acsomega.3c01218
work_keys_str_mv	AT einarsonkaspera molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT bendtsenkristianm molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT likang molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT thomsenmaria molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT kristensennielsr molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT wintherole molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT fullesimone molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT clemmensenline molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs AT refsgaardhannehf molecularrepresentationsinmachinelearningbasedpredictionofpkparametersforinsulinanalogs

Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs

Ejemplares similares