Cargando…

Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests

Prediction of stable peptide binding to Class I HLAs is an important component for designing immunotherapies. While the best performing predictors are based on machine learning algorithms trained on peptide-HLA (pHLA) sequences, the use of structure for training predictors deserves further explorati...

Descripción completa

Detalles Bibliográficos
Autores principales: Abella, Jayvee R., Antunes, Dinler A., Clementi, Cecilia, Kavraki, Lydia E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7387700/
https://www.ncbi.nlm.nih.gov/pubmed/32793224
http://dx.doi.org/10.3389/fimmu.2020.01583
_version_ 1783564179659030528
author Abella, Jayvee R.
Antunes, Dinler A.
Clementi, Cecilia
Kavraki, Lydia E.
author_facet Abella, Jayvee R.
Antunes, Dinler A.
Clementi, Cecilia
Kavraki, Lydia E.
author_sort Abella, Jayvee R.
collection PubMed
description Prediction of stable peptide binding to Class I HLAs is an important component for designing immunotherapies. While the best performing predictors are based on machine learning algorithms trained on peptide-HLA (pHLA) sequences, the use of structure for training predictors deserves further exploration. Given enough pHLA structures, a predictor based on the residue-residue interactions found in these structures has the potential to generalize for alleles with little or no experimental data. We have previously developed APE-Gen, a modeling approach able to produce pHLA structures in a scalable manner. In this work we use APE-Gen to model over 150,000 pHLA structures, the largest dataset of its kind, which were used to train a structure-based pan-allele model. We extract simple, homogenous features based on residue-residue distances between peptide and HLA, and build a random forest model for predicting stable pHLA binding. Our model achieves competitive AUROC values on leave-one-allele-out validation tests using significantly less data when compared to popular sequence-based methods. Additionally, our model offers an interpretation analysis that can reveal how the model composes the features to arrive at any given prediction. This interpretation analysis can be used to check if the model is in line with chemical intuition, and we showcase particular examples. Our work is a significant step toward using structure to achieve generalizable and more interpretable prediction for stable pHLA binding.
format Online
Article
Text
id pubmed-7387700
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-73877002020-08-12 Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests Abella, Jayvee R. Antunes, Dinler A. Clementi, Cecilia Kavraki, Lydia E. Front Immunol Immunology Prediction of stable peptide binding to Class I HLAs is an important component for designing immunotherapies. While the best performing predictors are based on machine learning algorithms trained on peptide-HLA (pHLA) sequences, the use of structure for training predictors deserves further exploration. Given enough pHLA structures, a predictor based on the residue-residue interactions found in these structures has the potential to generalize for alleles with little or no experimental data. We have previously developed APE-Gen, a modeling approach able to produce pHLA structures in a scalable manner. In this work we use APE-Gen to model over 150,000 pHLA structures, the largest dataset of its kind, which were used to train a structure-based pan-allele model. We extract simple, homogenous features based on residue-residue distances between peptide and HLA, and build a random forest model for predicting stable pHLA binding. Our model achieves competitive AUROC values on leave-one-allele-out validation tests using significantly less data when compared to popular sequence-based methods. Additionally, our model offers an interpretation analysis that can reveal how the model composes the features to arrive at any given prediction. This interpretation analysis can be used to check if the model is in line with chemical intuition, and we showcase particular examples. Our work is a significant step toward using structure to achieve generalizable and more interpretable prediction for stable pHLA binding. Frontiers Media S.A. 2020-07-22 /pmc/articles/PMC7387700/ /pubmed/32793224 http://dx.doi.org/10.3389/fimmu.2020.01583 Text en Copyright © 2020 Abella, Antunes, Clementi and Kavraki. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Abella, Jayvee R.
Antunes, Dinler A.
Clementi, Cecilia
Kavraki, Lydia E.
Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title_full Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title_fullStr Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title_full_unstemmed Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title_short Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests
title_sort large-scale structure-based prediction of stable peptide binding to class i hlas using random forests
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7387700/
https://www.ncbi.nlm.nih.gov/pubmed/32793224
http://dx.doi.org/10.3389/fimmu.2020.01583
work_keys_str_mv AT abellajayveer largescalestructurebasedpredictionofstablepeptidebindingtoclassihlasusingrandomforests
AT antunesdinlera largescalestructurebasedpredictionofstablepeptidebindingtoclassihlasusingrandomforests
AT clementicecilia largescalestructurebasedpredictionofstablepeptidebindingtoclassihlasusingrandomforests
AT kavrakilydiae largescalestructurebasedpredictionofstablepeptidebindingtoclassihlasusingrandomforests