Cargando…
Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condit...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993519/ https://www.ncbi.nlm.nih.gov/pubmed/33765908 http://dx.doi.org/10.1186/s12859-021-04087-7 |
_version_ | 1783669576032059392 |
---|---|
author | Yohannes, Dawit A. Kaukinen, Katri Kurppa, Kalle Saavalainen, Päivi Greco, Dario |
author_facet | Yohannes, Dawit A. Kaukinen, Katri Kurppa, Kalle Saavalainen, Päivi Greco, Dario |
author_sort | Yohannes, Dawit A. |
collection | PubMed |
description | BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. RESULTS: We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. CONCLUSION: We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04087-7. |
format | Online Article Text |
id | pubmed-7993519 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-79935192021-03-26 Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences Yohannes, Dawit A. Kaukinen, Katri Kurppa, Kalle Saavalainen, Päivi Greco, Dario BMC Bioinformatics Methodology Article BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. RESULTS: We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. CONCLUSION: We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04087-7. BioMed Central 2021-03-25 /pmc/articles/PMC7993519/ /pubmed/33765908 http://dx.doi.org/10.1186/s12859-021-04087-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Yohannes, Dawit A. Kaukinen, Katri Kurppa, Kalle Saavalainen, Päivi Greco, Dario Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title | Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title_full | Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title_fullStr | Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title_full_unstemmed | Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title_short | Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences |
title_sort | clustering based approach for population level identification of condition-associated t-cell receptor β-chain cdr3 sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993519/ https://www.ncbi.nlm.nih.gov/pubmed/33765908 http://dx.doi.org/10.1186/s12859-021-04087-7 |
work_keys_str_mv | AT yohannesdawita clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences AT kaukinenkatri clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences AT kurppakalle clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences AT saavalainenpaivi clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences AT grecodario clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences |