Cargando…

Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences

BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condit...

Descripción completa

Detalles Bibliográficos
Autores principales: Yohannes, Dawit A., Kaukinen, Katri, Kurppa, Kalle, Saavalainen, Päivi, Greco, Dario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993519/
https://www.ncbi.nlm.nih.gov/pubmed/33765908
http://dx.doi.org/10.1186/s12859-021-04087-7
_version_ 1783669576032059392
author Yohannes, Dawit A.
Kaukinen, Katri
Kurppa, Kalle
Saavalainen, Päivi
Greco, Dario
author_facet Yohannes, Dawit A.
Kaukinen, Katri
Kurppa, Kalle
Saavalainen, Päivi
Greco, Dario
author_sort Yohannes, Dawit A.
collection PubMed
description BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. RESULTS: We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. CONCLUSION: We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04087-7.
format Online
Article
Text
id pubmed-7993519
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79935192021-03-26 Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences Yohannes, Dawit A. Kaukinen, Katri Kurppa, Kalle Saavalainen, Päivi Greco, Dario BMC Bioinformatics Methodology Article BACKGROUND: Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. RESULTS: We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. CONCLUSION: We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04087-7. BioMed Central 2021-03-25 /pmc/articles/PMC7993519/ /pubmed/33765908 http://dx.doi.org/10.1186/s12859-021-04087-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Yohannes, Dawit A.
Kaukinen, Katri
Kurppa, Kalle
Saavalainen, Päivi
Greco, Dario
Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title_full Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title_fullStr Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title_full_unstemmed Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title_short Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
title_sort clustering based approach for population level identification of condition-associated t-cell receptor β-chain cdr3 sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993519/
https://www.ncbi.nlm.nih.gov/pubmed/33765908
http://dx.doi.org/10.1186/s12859-021-04087-7
work_keys_str_mv AT yohannesdawita clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences
AT kaukinenkatri clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences
AT kurppakalle clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences
AT saavalainenpaivi clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences
AT grecodario clusteringbasedapproachforpopulationlevelidentificationofconditionassociatedtcellreceptorbchaincdr3sequences