Cargando…

Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency

The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Lihua, Ly, Cedric, Abdollahi, Sina, Zhao, Yu, Prinz, Immo, Bonn, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152969/
https://www.ncbi.nlm.nih.gov/pubmed/37143667
http://dx.doi.org/10.3389/fimmu.2023.1128326
_version_ 1785035847578419200
author Deng, Lihua
Ly, Cedric
Abdollahi, Sina
Zhao, Yu
Prinz, Immo
Bonn, Stefan
author_facet Deng, Lihua
Ly, Cedric
Abdollahi, Sina
Zhao, Yu
Prinz, Immo
Bonn, Stefan
author_sort Deng, Lihua
collection PubMed
description The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
format Online
Article
Text
id pubmed-10152969
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101529692023-05-03 Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency Deng, Lihua Ly, Cedric Abdollahi, Sina Zhao, Yu Prinz, Immo Bonn, Stefan Front Immunol Immunology The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches. Frontiers Media S.A. 2023-04-18 /pmc/articles/PMC10152969/ /pubmed/37143667 http://dx.doi.org/10.3389/fimmu.2023.1128326 Text en Copyright © 2023 Deng, Ly, Abdollahi, Zhao, Prinz and Bonn https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Deng, Lihua
Ly, Cedric
Abdollahi, Sina
Zhao, Yu
Prinz, Immo
Bonn, Stefan
Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title_full Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title_fullStr Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title_full_unstemmed Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title_short Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
title_sort performance comparison of tcr-pmhc prediction tools reveals a strong data dependency
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152969/
https://www.ncbi.nlm.nih.gov/pubmed/37143667
http://dx.doi.org/10.3389/fimmu.2023.1128326
work_keys_str_mv AT denglihua performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency
AT lycedric performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency
AT abdollahisina performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency
AT zhaoyu performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency
AT prinzimmo performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency
AT bonnstefan performancecomparisonoftcrpmhcpredictiontoolsrevealsastrongdatadependency