Cargando…

AbLang: an antibody language model for completing antibody sequences

MOTIVATION: General protein language models have been shown to summarize the semantics of protein sequences into representations that are useful for state-of-the-art predictive methods. However, for antibody specific problems, such as restoring residues lost due to sequencing errors, a model trained...

Descripción completa

Detalles Bibliográficos
Autores principales: Olsen, Tobias H, Moal, Iain H, Deane, Charlotte M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710568/
https://www.ncbi.nlm.nih.gov/pubmed/36699403
http://dx.doi.org/10.1093/bioadv/vbac046
_version_ 1784841394710380544
author Olsen, Tobias H
Moal, Iain H
Deane, Charlotte M
author_facet Olsen, Tobias H
Moal, Iain H
Deane, Charlotte M
author_sort Olsen, Tobias H
collection PubMed
description MOTIVATION: General protein language models have been shown to summarize the semantics of protein sequences into representations that are useful for state-of-the-art predictive methods. However, for antibody specific problems, such as restoring residues lost due to sequencing errors, a model trained solely on antibodies may be more powerful. Antibodies are one of the few protein types where the volume of sequence data needed for such language models is available, e.g. in the Observed Antibody Space (OAS) database. RESULTS: Here, we introduce AbLang, a language model trained on the antibody sequences in the OAS database. We demonstrate the power of AbLang by using it to restore missing residues in antibody sequence data, a key issue with B-cell receptor repertoire sequencing, e.g. over 40% of OAS sequences are missing the first 15 amino acids. AbLang restores the missing residues of antibody sequences better than using IMGT germlines or the general protein language model ESM-1b. Further, AbLang does not require knowledge of the germline of the antibody and is seven times faster than ESM-1b. AVAILABILITY AND IMPLEMENTATION: AbLang is a python package available at https://github.com/oxpig/AbLang. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9710568
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97105682023-01-24 AbLang: an antibody language model for completing antibody sequences Olsen, Tobias H Moal, Iain H Deane, Charlotte M Bioinform Adv Original Paper MOTIVATION: General protein language models have been shown to summarize the semantics of protein sequences into representations that are useful for state-of-the-art predictive methods. However, for antibody specific problems, such as restoring residues lost due to sequencing errors, a model trained solely on antibodies may be more powerful. Antibodies are one of the few protein types where the volume of sequence data needed for such language models is available, e.g. in the Observed Antibody Space (OAS) database. RESULTS: Here, we introduce AbLang, a language model trained on the antibody sequences in the OAS database. We demonstrate the power of AbLang by using it to restore missing residues in antibody sequence data, a key issue with B-cell receptor repertoire sequencing, e.g. over 40% of OAS sequences are missing the first 15 amino acids. AbLang restores the missing residues of antibody sequences better than using IMGT germlines or the general protein language model ESM-1b. Further, AbLang does not require knowledge of the germline of the antibody and is seven times faster than ESM-1b. AVAILABILITY AND IMPLEMENTATION: AbLang is a python package available at https://github.com/oxpig/AbLang. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-06-17 /pmc/articles/PMC9710568/ /pubmed/36699403 http://dx.doi.org/10.1093/bioadv/vbac046 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Olsen, Tobias H
Moal, Iain H
Deane, Charlotte M
AbLang: an antibody language model for completing antibody sequences
title AbLang: an antibody language model for completing antibody sequences
title_full AbLang: an antibody language model for completing antibody sequences
title_fullStr AbLang: an antibody language model for completing antibody sequences
title_full_unstemmed AbLang: an antibody language model for completing antibody sequences
title_short AbLang: an antibody language model for completing antibody sequences
title_sort ablang: an antibody language model for completing antibody sequences
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710568/
https://www.ncbi.nlm.nih.gov/pubmed/36699403
http://dx.doi.org/10.1093/bioadv/vbac046
work_keys_str_mv AT olsentobiash ablanganantibodylanguagemodelforcompletingantibodysequences
AT moaliainh ablanganantibodylanguagemodelforcompletingantibodysequences
AT deanecharlottem ablanganantibodylanguagemodelforcompletingantibodysequences