Cargando…

Deciphering the language of antibodies using self-supervised learning

An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody...

Descripción completa

Detalles Bibliográficos
Autores principales: Leem, Jinwoo, Mitchell, Laura S., Farmery, James H.R., Barton, Justin, Galson, Jacob D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278498/
https://www.ncbi.nlm.nih.gov/pubmed/35845836
http://dx.doi.org/10.1016/j.patter.2022.100513
_version_ 1784746199619731456
author Leem, Jinwoo
Mitchell, Laura S.
Farmery, James H.R.
Barton, Justin
Galson, Jacob D.
author_facet Leem, Jinwoo
Mitchell, Laura S.
Farmery, James H.R.
Barton, Justin
Galson, Jacob D.
author_sort Leem, Jinwoo
collection PubMed
description An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.
format Online
Article
Text
id pubmed-9278498
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-92784982022-07-14 Deciphering the language of antibodies using self-supervised learning Leem, Jinwoo Mitchell, Laura S. Farmery, James H.R. Barton, Justin Galson, Jacob D. Patterns (N Y) Article An individual’s B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies. Elsevier 2022-05-18 /pmc/articles/PMC9278498/ /pubmed/35845836 http://dx.doi.org/10.1016/j.patter.2022.100513 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Leem, Jinwoo
Mitchell, Laura S.
Farmery, James H.R.
Barton, Justin
Galson, Jacob D.
Deciphering the language of antibodies using self-supervised learning
title Deciphering the language of antibodies using self-supervised learning
title_full Deciphering the language of antibodies using self-supervised learning
title_fullStr Deciphering the language of antibodies using self-supervised learning
title_full_unstemmed Deciphering the language of antibodies using self-supervised learning
title_short Deciphering the language of antibodies using self-supervised learning
title_sort deciphering the language of antibodies using self-supervised learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278498/
https://www.ncbi.nlm.nih.gov/pubmed/35845836
http://dx.doi.org/10.1016/j.patter.2022.100513
work_keys_str_mv AT leemjinwoo decipheringthelanguageofantibodiesusingselfsupervisedlearning
AT mitchelllauras decipheringthelanguageofantibodiesusingselfsupervisedlearning
AT farmeryjameshr decipheringthelanguageofantibodiesusingselfsupervisedlearning
AT bartonjustin decipheringthelanguageofantibodiesusingselfsupervisedlearning
AT galsonjacobd decipheringthelanguageofantibodiesusingselfsupervisedlearning