Cargando…
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza h...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515799/ https://www.ncbi.nlm.nih.gov/pubmed/37745338 http://dx.doi.org/10.1101/2023.09.11.557288 |
_version_ | 1785109021544415232 |
---|---|
author | Wang, Yiquan Lv, Huibin Lei, Ruipeng Yeung, Yuen-Hei Shen, Ivana R. Choi, Danbi Teo, Qi Wen Tan, Timothy J.C. Gopal, Akshita B. Chen, Xin Graham, Claire S. Wu, Nicholas C. |
author_facet | Wang, Yiquan Lv, Huibin Lei, Ruipeng Yeung, Yuen-Hei Shen, Ivana R. Choi, Danbi Teo, Qi Wen Tan, Timothy J.C. Gopal, Akshita B. Chen, Xin Graham, Claire S. Wu, Nicholas C. |
author_sort | Wang, Yiquan |
collection | PubMed |
description | Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research. |
format | Online Article Text |
id | pubmed-10515799 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-105157992023-09-23 An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies Wang, Yiquan Lv, Huibin Lei, Ruipeng Yeung, Yuen-Hei Shen, Ivana R. Choi, Danbi Teo, Qi Wen Tan, Timothy J.C. Gopal, Akshita B. Chen, Xin Graham, Claire S. Wu, Nicholas C. bioRxiv Article Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research. Cold Spring Harbor Laboratory 2023-09-14 /pmc/articles/PMC10515799/ /pubmed/37745338 http://dx.doi.org/10.1101/2023.09.11.557288 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Wang, Yiquan Lv, Huibin Lei, Ruipeng Yeung, Yuen-Hei Shen, Ivana R. Choi, Danbi Teo, Qi Wen Tan, Timothy J.C. Gopal, Akshita B. Chen, Xin Graham, Claire S. Wu, Nicholas C. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title | An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title_full | An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title_fullStr | An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title_full_unstemmed | An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title_short | An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
title_sort | explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515799/ https://www.ncbi.nlm.nih.gov/pubmed/37745338 http://dx.doi.org/10.1101/2023.09.11.557288 |
work_keys_str_mv | AT wangyiquan anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT lvhuibin anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT leiruipeng anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT yeungyuenhei anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT shenivanar anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT choidanbi anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT teoqiwen anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT tantimothyjc anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT gopalakshitab anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT chenxin anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT grahamclaires anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT wunicholasc anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT wangyiquan explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT lvhuibin explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT leiruipeng explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT yeungyuenhei explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT shenivanar explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT choidanbi explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT teoqiwen explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT tantimothyjc explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT gopalakshitab explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT chenxin explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT grahamclaires explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies AT wunicholasc explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies |