Cargando…

On TCR binding predictors failing to generalize to unseen peptides

Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how stat...

Descripción completa

Detalles Bibliográficos
Autores principales: Grazioli, Filippo, Mösch, Anja, Machart, Pierre, Li, Kai, Alqassem, Israa, O’Donnell, Timothy J., Min, Martin Renqiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634250/
https://www.ncbi.nlm.nih.gov/pubmed/36341448
http://dx.doi.org/10.3389/fimmu.2022.1014256
_version_ 1784824426849632256
author Grazioli, Filippo
Mösch, Anja
Machart, Pierre
Li, Kai
Alqassem, Israa
O’Donnell, Timothy J.
Min, Martin Renqiang
author_facet Grazioli, Filippo
Mösch, Anja
Machart, Pierre
Li, Kai
Alqassem, Israa
O’Donnell, Timothy J.
Min, Martin Renqiang
author_sort Grazioli, Filippo
collection PubMed
description Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.
format Online
Article
Text
id pubmed-9634250
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96342502022-11-05 On TCR binding predictors failing to generalize to unseen peptides Grazioli, Filippo Mösch, Anja Machart, Pierre Li, Kai Alqassem, Israa O’Donnell, Timothy J. Min, Martin Renqiang Front Immunol Immunology Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved. Frontiers Media S.A. 2022-10-21 /pmc/articles/PMC9634250/ /pubmed/36341448 http://dx.doi.org/10.3389/fimmu.2022.1014256 Text en Copyright © 2022 Grazioli, Mösch, Machart, Li, Alqassem, O’Donnell and Min https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Grazioli, Filippo
Mösch, Anja
Machart, Pierre
Li, Kai
Alqassem, Israa
O’Donnell, Timothy J.
Min, Martin Renqiang
On TCR binding predictors failing to generalize to unseen peptides
title On TCR binding predictors failing to generalize to unseen peptides
title_full On TCR binding predictors failing to generalize to unseen peptides
title_fullStr On TCR binding predictors failing to generalize to unseen peptides
title_full_unstemmed On TCR binding predictors failing to generalize to unseen peptides
title_short On TCR binding predictors failing to generalize to unseen peptides
title_sort on tcr binding predictors failing to generalize to unseen peptides
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634250/
https://www.ncbi.nlm.nih.gov/pubmed/36341448
http://dx.doi.org/10.3389/fimmu.2022.1014256
work_keys_str_mv AT graziolifilippo ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT moschanja ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT machartpierre ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT likai ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT alqassemisraa ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT odonnelltimothyj ontcrbindingpredictorsfailingtogeneralizetounseenpeptides
AT minmartinrenqiang ontcrbindingpredictorsfailingtogeneralizetounseenpeptides