Cargando…
Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features
Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to t...
Autores principales: | , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Taylor & Francis
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872953/ https://www.ncbi.nlm.nih.gov/pubmed/36683173 http://dx.doi.org/10.1080/19420862.2022.2163584 |
_version_ | 1784877496732221440 |
---|---|
author | Harmalkar, Ameya Rao, Roshan Richard Xie, Yuxuan Honer, Jonas Deisting, Wibke Anlahr, Jonas Hoenig, Anja Czwikla, Julia Sienz-Widmann, Eva Rau, Doris Rice, Austin J. Riley, Timothy P. Li, Danqing Catterall, Hannah B. Tinberg, Christine E. Gray, Jeffrey J. Wei, Kathy Y. |
author_facet | Harmalkar, Ameya Rao, Roshan Richard Xie, Yuxuan Honer, Jonas Deisting, Wibke Anlahr, Jonas Hoenig, Anja Czwikla, Julia Sienz-Widmann, Eva Rau, Doris Rice, Austin J. Riley, Timothy P. Li, Danqing Catterall, Hannah B. Tinberg, Christine E. Gray, Jeffrey J. Wei, Kathy Y. |
author_sort | Harmalkar, Ameya |
collection | PubMed |
description | Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, [Image: see text] , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ([Image: see text] 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs. |
format | Online Article Text |
id | pubmed-9872953 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Taylor & Francis |
record_format | MEDLINE/PubMed |
spelling | pubmed-98729532023-02-08 Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features Harmalkar, Ameya Rao, Roshan Richard Xie, Yuxuan Honer, Jonas Deisting, Wibke Anlahr, Jonas Hoenig, Anja Czwikla, Julia Sienz-Widmann, Eva Rau, Doris Rice, Austin J. Riley, Timothy P. Li, Danqing Catterall, Hannah B. Tinberg, Christine E. Gray, Jeffrey J. Wei, Kathy Y. MAbs Report Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA’s recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches – one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features – to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, [Image: see text] , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task ([Image: see text] 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs. Taylor & Francis 2023-01-22 /pmc/articles/PMC9872953/ /pubmed/36683173 http://dx.doi.org/10.1080/19420862.2022.2163584 Text en © 2023 Amgen, Inc. Published with license by Taylor & Francis Group, LLC. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Harmalkar, Ameya Rao, Roshan Richard Xie, Yuxuan Honer, Jonas Deisting, Wibke Anlahr, Jonas Hoenig, Anja Czwikla, Julia Sienz-Widmann, Eva Rau, Doris Rice, Austin J. Riley, Timothy P. Li, Danqing Catterall, Hannah B. Tinberg, Christine E. Gray, Jeffrey J. Wei, Kathy Y. Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title | Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title_full | Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title_fullStr | Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title_full_unstemmed | Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title_short | Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
title_sort | toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872953/ https://www.ncbi.nlm.nih.gov/pubmed/36683173 http://dx.doi.org/10.1080/19420862.2022.2163584 |
work_keys_str_mv | AT harmalkarameya towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT raoroshan towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT richardxieyuxuan towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT honerjonas towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT deistingwibke towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT anlahrjonas towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT hoeniganja towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT czwiklajulia towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT sienzwidmanneva towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT raudoris towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT riceaustinj towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT rileytimothyp towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT lidanqing towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT catterallhannahb towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT tinbergchristinee towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT grayjeffreyj towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures AT weikathyy towardgeneralizablepredictionofantibodythermostabilityusingmachinelearningonsequenceandstructurefeatures |