Cargando…
Deciphering the language of antibodies using self-supervised learning
An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278498/ https://www.ncbi.nlm.nih.gov/pubmed/35845836 http://dx.doi.org/10.1016/j.patter.2022.100513 |
_version_ | 1784746199619731456 |
---|---|
author | Leem, Jinwoo Mitchell, Laura S. Farmery, James H.R. Barton, Justin Galson, Jacob D. |
author_facet | Leem, Jinwoo Mitchell, Laura S. Farmery, James H.R. Barton, Justin Galson, Jacob D. |
author_sort | Leem, Jinwoo |
collection | PubMed |
description | An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies. |
format | Online Article Text |
id | pubmed-9278498 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-92784982022-07-14 Deciphering the language of antibodies using self-supervised learning Leem, Jinwoo Mitchell, Laura S. Farmery, James H.R. Barton, Justin Galson, Jacob D. Patterns (N Y) Article An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies. Elsevier 2022-05-18 /pmc/articles/PMC9278498/ /pubmed/35845836 http://dx.doi.org/10.1016/j.patter.2022.100513 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Leem, Jinwoo Mitchell, Laura S. Farmery, James H.R. Barton, Justin Galson, Jacob D. Deciphering the language of antibodies using self-supervised learning |
title | Deciphering the language of antibodies using self-supervised learning |
title_full | Deciphering the language of antibodies using self-supervised learning |
title_fullStr | Deciphering the language of antibodies using self-supervised learning |
title_full_unstemmed | Deciphering the language of antibodies using self-supervised learning |
title_short | Deciphering the language of antibodies using self-supervised learning |
title_sort | deciphering the language of antibodies using self-supervised learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278498/ https://www.ncbi.nlm.nih.gov/pubmed/35845836 http://dx.doi.org/10.1016/j.patter.2022.100513 |
work_keys_str_mv | AT leemjinwoo decipheringthelanguageofantibodiesusingselfsupervisedlearning AT mitchelllauras decipheringthelanguageofantibodiesusingselfsupervisedlearning AT farmeryjameshr decipheringthelanguageofantibodiesusingselfsupervisedlearning AT bartonjustin decipheringthelanguageofantibodiesusingselfsupervisedlearning AT galsonjacobd decipheringthelanguageofantibodiesusingselfsupervisedlearning |