Cargando…

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza h...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yiquan, Lv, Huibin, Lei, Ruipeng, Yeung, Yuen-Hei, Shen, Ivana R., Choi, Danbi, Teo, Qi Wen, Tan, Timothy J.C., Gopal, Akshita B., Chen, Xin, Graham, Claire S., Wu, Nicholas C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515799/
https://www.ncbi.nlm.nih.gov/pubmed/37745338
http://dx.doi.org/10.1101/2023.09.11.557288
_version_ 1785109021544415232
author Wang, Yiquan
Lv, Huibin
Lei, Ruipeng
Yeung, Yuen-Hei
Shen, Ivana R.
Choi, Danbi
Teo, Qi Wen
Tan, Timothy J.C.
Gopal, Akshita B.
Chen, Xin
Graham, Claire S.
Wu, Nicholas C.
author_facet Wang, Yiquan
Lv, Huibin
Lei, Ruipeng
Yeung, Yuen-Hei
Shen, Ivana R.
Choi, Danbi
Teo, Qi Wen
Tan, Timothy J.C.
Gopal, Akshita B.
Chen, Xin
Graham, Claire S.
Wu, Nicholas C.
author_sort Wang, Yiquan
collection PubMed
description Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research.
format Online
Article
Text
id pubmed-10515799
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105157992023-09-23 An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies Wang, Yiquan Lv, Huibin Lei, Ruipeng Yeung, Yuen-Hei Shen, Ivana R. Choi, Danbi Teo, Qi Wen Tan, Timothy J.C. Gopal, Akshita B. Chen, Xin Graham, Claire S. Wu, Nicholas C. bioRxiv Article Despite decades of antibody research, it remains challenging to predict the specificity of an antibody solely based on its sequence. Two major obstacles are the lack of appropriate models and inaccessibility of datasets for model training. In this study, we curated a dataset of >5,000 influenza hemagglutinin (HA) antibodies by mining research publications and patents, which revealed many distinct sequence features between antibodies to HA head and stem domains. We then leveraged this dataset to develop a lightweight memory B cell language model (mBLM) for sequence-based antibody specificity prediction. Model explainability analysis showed that mBLM captured key sequence motifs of HA stem antibodies. Additionally, by applying mBLM to HA antibodies with unknown epitopes, we discovered and experimentally validated many HA stem antibodies. Overall, this study not only advances our molecular understanding of antibody response to influenza virus, but also provides an invaluable resource for applying deep learning to antibody research. Cold Spring Harbor Laboratory 2023-09-14 /pmc/articles/PMC10515799/ /pubmed/37745338 http://dx.doi.org/10.1101/2023.09.11.557288 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Wang, Yiquan
Lv, Huibin
Lei, Ruipeng
Yeung, Yuen-Hei
Shen, Ivana R.
Choi, Danbi
Teo, Qi Wen
Tan, Timothy J.C.
Gopal, Akshita B.
Chen, Xin
Graham, Claire S.
Wu, Nicholas C.
An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title_full An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title_fullStr An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title_full_unstemmed An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title_short An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
title_sort explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515799/
https://www.ncbi.nlm.nih.gov/pubmed/37745338
http://dx.doi.org/10.1101/2023.09.11.557288
work_keys_str_mv AT wangyiquan anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT lvhuibin anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT leiruipeng anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT yeungyuenhei anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT shenivanar anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT choidanbi anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT teoqiwen anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT tantimothyjc anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT gopalakshitab anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT chenxin anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT grahamclaires anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT wunicholasc anexplainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT wangyiquan explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT lvhuibin explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT leiruipeng explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT yeungyuenhei explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT shenivanar explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT choidanbi explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT teoqiwen explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT tantimothyjc explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT gopalakshitab explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT chenxin explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT grahamclaires explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies
AT wunicholasc explainablelanguagemodelforantibodyspecificitypredictionusingcuratedinfluenzahemagglutininantibodies