Cargando…
On TCR binding predictors failing to generalize to unseen peptides
Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how stat...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634250/ https://www.ncbi.nlm.nih.gov/pubmed/36341448 http://dx.doi.org/10.3389/fimmu.2022.1014256 |
_version_ | 1784824426849632256 |
---|---|
author | Grazioli, Filippo Mösch, Anja Machart, Pierre Li, Kai Alqassem, Israa O’Donnell, Timothy J. Min, Martin Renqiang |
author_facet | Grazioli, Filippo Mösch, Anja Machart, Pierre Li, Kai Alqassem, Israa O’Donnell, Timothy J. Min, Martin Renqiang |
author_sort | Grazioli, Filippo |
collection | PubMed |
description | Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved. |
format | Online Article Text |
id | pubmed-9634250 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96342502022-11-05 On TCR binding predictors failing to generalize to unseen peptides Grazioli, Filippo Mösch, Anja Machart, Pierre Li, Kai Alqassem, Israa O’Donnell, Timothy J. Min, Martin Renqiang Front Immunol Immunology Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models’ test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved. Frontiers Media S.A. 2022-10-21 /pmc/articles/PMC9634250/ /pubmed/36341448 http://dx.doi.org/10.3389/fimmu.2022.1014256 Text en Copyright © 2022 Grazioli, Mösch, Machart, Li, Alqassem, O’Donnell and Min https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Immunology Grazioli, Filippo Mösch, Anja Machart, Pierre Li, Kai Alqassem, Israa O’Donnell, Timothy J. Min, Martin Renqiang On TCR binding predictors failing to generalize to unseen peptides |
title | On TCR binding predictors failing to generalize to unseen peptides |
title_full | On TCR binding predictors failing to generalize to unseen peptides |
title_fullStr | On TCR binding predictors failing to generalize to unseen peptides |
title_full_unstemmed | On TCR binding predictors failing to generalize to unseen peptides |
title_short | On TCR binding predictors failing to generalize to unseen peptides |
title_sort | on tcr binding predictors failing to generalize to unseen peptides |
topic | Immunology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634250/ https://www.ncbi.nlm.nih.gov/pubmed/36341448 http://dx.doi.org/10.3389/fimmu.2022.1014256 |
work_keys_str_mv | AT graziolifilippo ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT moschanja ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT machartpierre ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT likai ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT alqassemisraa ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT odonnelltimothyj ontcrbindingpredictorsfailingtogeneralizetounseenpeptides AT minmartinrenqiang ontcrbindingpredictorsfailingtogeneralizetounseenpeptides |