Cargando…

A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression

MOTIVATION: T-cell receptor beta chain (TCRB) repertoires are crucial for understanding immune responses. However, their high diversity and complexity present significant challenges in representation and analysis. The main motivation of this study is to develop a unified and compact representation o...

Descripción completa

Detalles Bibliográficos
Autores principales: Konstantinovsky, Thomas, Yaari, Gur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348835/
https://www.ncbi.nlm.nih.gov/pubmed/37417959
http://dx.doi.org/10.1093/bioinformatics/btad426
_version_ 1785073746996887552
author Konstantinovsky, Thomas
Yaari, Gur
author_facet Konstantinovsky, Thomas
Yaari, Gur
author_sort Konstantinovsky, Thomas
collection PubMed
description MOTIVATION: T-cell receptor beta chain (TCRB) repertoires are crucial for understanding immune responses. However, their high diversity and complexity present significant challenges in representation and analysis. The main motivation of this study is to develop a unified and compact representation of a TCRB repertoire that can efficiently capture its inherent complexity and diversity and allow for direct inference. RESULTS: We introduce a novel approach to TCRB repertoire encoding and analysis, leveraging the Lempel-Ziv 76 algorithm. This approach allows us to create a graph-like model, identify-specific sequence features, and produce a new encoding approach for an individual’s repertoire. The proposed representation enables various applications, including generation probability inference, informative feature vector derivation, sequence generation, a new measure for diversity estimation, and a new sequence centrality measure. The approach was applied to four large-scale public TCRB sequencing datasets, demonstrating its potential for a wide range of applications in big biological sequencing data. AVAILABILITY AND IMPLEMENTATION: Python package for implementation is available https://github.com/MuteJester/LZGraphs.
format Online
Article
Text
id pubmed-10348835
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103488352023-07-15 A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression Konstantinovsky, Thomas Yaari, Gur Bioinformatics Original Paper MOTIVATION: T-cell receptor beta chain (TCRB) repertoires are crucial for understanding immune responses. However, their high diversity and complexity present significant challenges in representation and analysis. The main motivation of this study is to develop a unified and compact representation of a TCRB repertoire that can efficiently capture its inherent complexity and diversity and allow for direct inference. RESULTS: We introduce a novel approach to TCRB repertoire encoding and analysis, leveraging the Lempel-Ziv 76 algorithm. This approach allows us to create a graph-like model, identify-specific sequence features, and produce a new encoding approach for an individual’s repertoire. The proposed representation enables various applications, including generation probability inference, informative feature vector derivation, sequence generation, a new measure for diversity estimation, and a new sequence centrality measure. The approach was applied to four large-scale public TCRB sequencing datasets, demonstrating its potential for a wide range of applications in big biological sequencing data. AVAILABILITY AND IMPLEMENTATION: Python package for implementation is available https://github.com/MuteJester/LZGraphs. Oxford University Press 2023-07-07 /pmc/articles/PMC10348835/ /pubmed/37417959 http://dx.doi.org/10.1093/bioinformatics/btad426 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Konstantinovsky, Thomas
Yaari, Gur
A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title_full A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title_fullStr A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title_full_unstemmed A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title_short A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression
title_sort novel approach to t-cell receptor beta chain (tcrb) repertoire encoding using lossless string compression
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348835/
https://www.ncbi.nlm.nih.gov/pubmed/37417959
http://dx.doi.org/10.1093/bioinformatics/btad426
work_keys_str_mv AT konstantinovskythomas anovelapproachtotcellreceptorbetachaintcrbrepertoireencodingusinglosslessstringcompression
AT yaarigur anovelapproachtotcellreceptorbetachaintcrbrepertoireencodingusinglosslessstringcompression
AT konstantinovskythomas novelapproachtotcellreceptorbetachaintcrbrepertoireencodingusinglosslessstringcompression
AT yaarigur novelapproachtotcellreceptorbetachaintcrbrepertoireencodingusinglosslessstringcompression