Cargando…
Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of the...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152969/ https://www.ncbi.nlm.nih.gov/pubmed/37143667 http://dx.doi.org/10.3389/fimmu.2023.1128326 |
_version_ | 1785035847578419200 |
---|---|
author | Deng, Lihua Ly, Cedric Abdollahi, Sina Zhao, Yu Prinz, Immo Bonn, Stefan |
author_facet | Deng, Lihua Ly, Cedric Abdollahi, Sina Zhao, Yu Prinz, Immo Bonn, Stefan |
author_sort | Deng, Lihua |
collection | PubMed |
description | The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches. |
format | Online Article Text |
id | pubmed-10152969 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-101529692023-05-03 Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency Deng, Lihua Ly, Cedric Abdollahi, Sina Zhao, Yu Prinz, Immo Bonn, Stefan Front Immunol Immunology The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches. Frontiers Media S.A. 2023-04-18 /pmc/articles/PMC10152969/ /pubmed/37143667 http://dx.doi.org/10.3389/fimmu.2023.1128326 Text en Copyright © 2023 Deng, Ly, Abdollahi, Zhao, Prinz and Bonn https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Immunology Deng, Lihua Ly, Cedric Abdollahi, Sina Zhao, Yu Prinz, Immo Bonn, Stefan Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title | Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title_full | Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title_fullStr | Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title_full_unstemmed | Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title_short | Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency |
title_sort | performance comparison of tcr-pmhc prediction tools reveals a strong data dependency |
topic | Immunology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152969/ https://www.ncbi.nlm.nih.gov/pubmed/37143667 http://dx.doi.org/10.3389/fimmu.2023.1128326 |
work_keys_str_mv | AT denglihua performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency AT lycedric performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency AT abdollahisina performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency AT zhaoyu performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency AT prinzimmo performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency AT bonnstefan performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency |