Cargando…

Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling

MOTIVATION: Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including...

Descripción completa

Detalles Bibliográficos
Autores principales: Hartout, Philip, Počuča, Bojana, Méndez-García, Celia, Schleberger, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10421966/
https://www.ncbi.nlm.nih.gov/pubmed/37527005
http://dx.doi.org/10.1093/bioinformatics/btad469
_version_ 1785089094262456320
author Hartout, Philip
Počuča, Bojana
Méndez-García, Celia
Schleberger, Christian
author_facet Hartout, Philip
Počuča, Bojana
Méndez-García, Celia
Schleberger, Christian
author_sort Hartout, Philip
collection PubMed
description MOTIVATION: Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale datasets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modeling have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction. RESULTS: Here, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-A(g7) MHC class II molecule expressed by nonobese diabetic mice enabled for the first time the accurate in silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/Novartis/AEGIS.
format Online
Article
Text
id pubmed-10421966
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104219662023-08-13 Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling Hartout, Philip Počuča, Bojana Méndez-García, Celia Schleberger, Christian Bioinformatics Original Paper MOTIVATION: Identifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale datasets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modeling have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction. RESULTS: Here, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-A(g7) MHC class II molecule expressed by nonobese diabetic mice enabled for the first time the accurate in silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/Novartis/AEGIS. Oxford University Press 2023-08-01 /pmc/articles/PMC10421966/ /pubmed/37527005 http://dx.doi.org/10.1093/bioinformatics/btad469 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Hartout, Philip
Počuča, Bojana
Méndez-García, Celia
Schleberger, Christian
Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title_full Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title_fullStr Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title_full_unstemmed Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title_short Investigating the human and nonobese diabetic mouse MHC class II immunopeptidome using protein language modeling
title_sort investigating the human and nonobese diabetic mouse mhc class ii immunopeptidome using protein language modeling
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10421966/
https://www.ncbi.nlm.nih.gov/pubmed/37527005
http://dx.doi.org/10.1093/bioinformatics/btad469
work_keys_str_mv AT hartoutphilip investigatingthehumanandnonobesediabeticmousemhcclassiiimmunopeptidomeusingproteinlanguagemodeling
AT pocucabojana investigatingthehumanandnonobesediabeticmousemhcclassiiimmunopeptidomeusingproteinlanguagemodeling
AT mendezgarciacelia investigatingthehumanandnonobesediabeticmousemhcclassiiimmunopeptidomeusingproteinlanguagemodeling
AT schlebergerchristian investigatingthehumanandnonobesediabeticmousemhcclassiiimmunopeptidomeusingproteinlanguagemodeling