Cargando…

Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features

Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Harmalkar, Ameya, Rao, Roshan, Richard Xie, Yuxuan, Honer, Jonas, Deisting, Wibke, Anlahr, Jonas, Hoenig, Anja, Czwikla, Julia, Sienz-Widmann, Eva, Rau, Doris, Rice, Austin J., Riley, Timothy P., Li, Danqing, Catterall, Hannah B., Tinberg, Christine E., Gray, Jeffrey J., Wei, Kathy Y.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Taylor & Francis 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872953/
https://www.ncbi.nlm.nih.gov/pubmed/36683173
http://dx.doi.org/10.1080/19420862.2022.2163584
_version_ 1784877496732221440
author Harmalkar, Ameya
Rao, Roshan
Richard Xie, Yuxuan
Honer, Jonas
Deisting, Wibke
Anlahr, Jonas
Hoenig, Anja
Czwikla, Julia
Sienz-Widmann, Eva
Rau, Doris
Rice, Austin J.
Riley, Timothy P.
Li, Danqing
Catterall, Hannah B.
Tinberg, Christine E.
Gray, Jeffrey J.
Wei, Kathy Y.
author_facet Harmalkar, Ameya
Rao, Roshan
Richard Xie, Yuxuan
Honer, Jonas
Deisting, Wibke
Anlahr, Jonas
Hoenig, Anja
Czwikla, Julia
Sienz-Widmann, Eva
Rau, Doris
Rice, Austin J.
Riley, Timothy P.
Li, Danqing
Catterall, Hannah B.
Tinberg, Christine E.
Gray, Jeffrey J.
Wei, Kathy Y.
author_sort Harmalkar, Ameya
collection PubMed
description Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, [Image: see text] , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ([Image: see text] 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.
format Online
Article
Text
id pubmed-9872953
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Taylor & Francis
record_format MEDLINE/PubMed
spelling pubmed-98729532023-02-08 Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features Harmalkar, Ameya Rao, Roshan Richard Xie, Yuxuan Honer, Jonas Deisting, Wibke Anlahr, Jonas Hoenig, Anja Czwikla, Julia Sienz-Widmann, Eva Rau, Doris Rice, Austin J. Riley, Timothy P. Li, Danqing Catterall, Hannah B. Tinberg, Christine E. Gray, Jeffrey J. Wei, Kathy Y. MAbs Report Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, [Image: see text] , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ([Image: see text] 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs. Taylor & Francis 2023-01-22 /pmc/articles/PMC9872953/ /pubmed/36683173 http://dx.doi.org/10.1080/19420862.2022.2163584 Text en © 2023 Amgen, Inc. Published with license by Taylor & Francis Group, LLC. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Harmalkar, Ameya
Rao, Roshan
Richard Xie, Yuxuan
Honer, Jonas
Deisting, Wibke
Anlahr, Jonas
Hoenig, Anja
Czwikla, Julia
Sienz-Widmann, Eva
Rau, Doris
Rice, Austin J.
Riley, Timothy P.
Li, Danqing
Catterall, Hannah B.
Tinberg, Christine E.
Gray, Jeffrey J.
Wei, Kathy Y.
Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title_full Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title_fullStr Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title_full_unstemmed Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title_short Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
title_sort toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872953/
https://www.ncbi.nlm.nih.gov/pubmed/36683173
http://dx.doi.org/10.1080/19420862.2022.2163584
work_keys_str_mv AT harmalkarameya towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT raoroshan towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT richardxieyuxuan towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT honerjonas towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT deistingwibke towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT anlahrjonas towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT hoeniganja towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT czwiklajulia towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT sienzwidmanneva towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT raudoris towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT riceaustinj towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT rileytimothyp towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT lidanqing towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT catterallhannahb towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT tinbergchristinee towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT grayjeffreyj towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures
AT weikathyy towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures