Cargando…

Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls

Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ...

Descripción completa

Detalles Bibliográficos
Autores principales: Shemesh, Or, Polak, Pazit, Lundin, Knut E. A., Sollid, Ludvig M., Yaari, Gur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8006302/
https://www.ncbi.nlm.nih.gov/pubmed/33790900
http://dx.doi.org/10.3389/fimmu.2021.627813
_version_ 1783672286798151680
author Shemesh, Or
Polak, Pazit
Lundin, Knut E. A.
Sollid, Ludvig M.
Yaari, Gur
author_facet Shemesh, Or
Polak, Pazit
Lundin, Knut E. A.
Sollid, Ludvig M.
Yaari, Gur
author_sort Shemesh, Or
collection PubMed
description Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers.
format Online
Article
Text
id pubmed-8006302
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80063022021-03-30 Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls Shemesh, Or Polak, Pazit Lundin, Knut E. A. Sollid, Ludvig M. Yaari, Gur Front Immunol Immunology Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers. Frontiers Media S.A. 2021-03-10 /pmc/articles/PMC8006302/ /pubmed/33790900 http://dx.doi.org/10.3389/fimmu.2021.627813 Text en Copyright © 2021 Shemesh, Polak, Lundin, Sollid and Yaari. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Shemesh, Or
Polak, Pazit
Lundin, Knut E. A.
Sollid, Ludvig M.
Yaari, Gur
Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title_full Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title_fullStr Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title_full_unstemmed Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title_short Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls
title_sort machine learning analysis of naïve b-cell receptor repertoires stratifies celiac disease patients and controls
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8006302/
https://www.ncbi.nlm.nih.gov/pubmed/33790900
http://dx.doi.org/10.3389/fimmu.2021.627813
work_keys_str_mv AT shemeshor machinelearninganalysisofnaivebcellreceptorrepertoiresstratifiesceliacdiseasepatientsandcontrols
AT polakpazit machinelearninganalysisofnaivebcellreceptorrepertoiresstratifiesceliacdiseasepatientsandcontrols
AT lundinknutea machinelearninganalysisofnaivebcellreceptorrepertoiresstratifiesceliacdiseasepatientsandcontrols
AT sollidludvigm machinelearninganalysisofnaivebcellreceptorrepertoiresstratifiesceliacdiseasepatientsandcontrols
AT yaarigur machinelearninganalysisofnaivebcellreceptorrepertoiresstratifiesceliacdiseasepatientsandcontrols