Cargando…

Prediction of celiac disease associated epitopes and motifs in a protein

INTRODUCTION: Celiac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/ant...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tomer, Ritu, Patiyal, Sumeet, Dhall, Anjali, Raghava, Gajendra P. S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Immunology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9893285/ https://www.ncbi.nlm.nih.gov/pubmed/36742312 http://dx.doi.org/10.3389/fimmu.2023.1056101

_version_	1784881492114014208
author	Tomer, Ritu Patiyal, Sumeet Dhall, Anjali Raghava, Gajendra P. S.
author_facet	Tomer, Ritu Patiyal, Sumeet Dhall, Anjali Raghava, Gajendra P. S.
author_sort	Tomer, Ritu
collection	PubMed
description	INTRODUCTION: Celiac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics. METHODS: In this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. We perform positional analysis to identify the most significant position of an amino acid residue in the peptide and checked the frequency of HLA alleles. We also compute amino acid composition to develop machine learning based models. We also developed ensemble method that combines motif-based approach and machine learning based models. RESULTS AND DISCUSSION: Our analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated peptides which achieve maximum AUROC 0.98 on independent data. We discovered motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package “CDpred”. We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/).
format	Online Article Text
id	pubmed-9893285
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98932852023-02-03 Prediction of celiac disease associated epitopes and motifs in a protein Tomer, Ritu Patiyal, Sumeet Dhall, Anjali Raghava, Gajendra P. S. Front Immunol Immunology INTRODUCTION: Celiac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics. METHODS: In this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. We perform positional analysis to identify the most significant position of an amino acid residue in the peptide and checked the frequency of HLA alleles. We also compute amino acid composition to develop machine learning based models. We also developed ensemble method that combines motif-based approach and machine learning based models. RESULTS AND DISCUSSION: Our analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated peptides which achieve maximum AUROC 0.98 on independent data. We discovered motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package “CDpred”. We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/). Frontiers Media S.A. 2023-01-19 /pmc/articles/PMC9893285/ /pubmed/36742312 http://dx.doi.org/10.3389/fimmu.2023.1056101 Text en Copyright © 2023 Tomer, Patiyal, Dhall and Raghava https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Immunology Tomer, Ritu Patiyal, Sumeet Dhall, Anjali Raghava, Gajendra P. S. Prediction of celiac disease associated epitopes and motifs in a protein
title	Prediction of celiac disease associated epitopes and motifs in a protein
title_full	Prediction of celiac disease associated epitopes and motifs in a protein
title_fullStr	Prediction of celiac disease associated epitopes and motifs in a protein
title_full_unstemmed	Prediction of celiac disease associated epitopes and motifs in a protein
title_short	Prediction of celiac disease associated epitopes and motifs in a protein
title_sort	prediction of celiac disease associated epitopes and motifs in a protein
topic	Immunology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9893285/ https://www.ncbi.nlm.nih.gov/pubmed/36742312 http://dx.doi.org/10.3389/fimmu.2023.1056101
work_keys_str_mv	AT tomerritu predictionofceliacdiseaseassociatedepitopesandmotifsinaprotein AT patiyalsumeet predictionofceliacdiseaseassociatedepitopesandmotifsinaprotein AT dhallanjali predictionofceliacdiseaseassociatedepitopesandmotifsinaprotein AT raghavagajendraps predictionofceliacdiseaseassociatedepitopesandmotifsinaprotein

Prediction of celiac disease associated epitopes and motifs in a protein

Ejemplares similares