Cargando…

repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data

Motivation: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events—choices of gene templates, base pair deletions and insertions—described by probability...

Descripción completa

Detalles Bibliográficos
Autores principales: Elhanati, Yuval, Marcou, Quentin, Mora, Thierry, Walczak, Aleksandra M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920122/
https://www.ncbi.nlm.nih.gov/pubmed/27153709
http://dx.doi.org/10.1093/bioinformatics/btw112
_version_ 1782439353459408896
author Elhanati, Yuval
Marcou, Quentin
Mora, Thierry
Walczak, Aleksandra M.
author_facet Elhanati, Yuval
Marcou, Quentin
Mora, Thierry
Walczak, Aleksandra M.
author_sort Elhanati, Yuval
collection PubMed
description Motivation: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events—choices of gene templates, base pair deletions and insertions—described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. Results: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum–Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be [Formula: see text] for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. Availability and implementation: Source code and sample sequence files are available at https://bitbucket.org/yuvalel/repgenhmm/downloads. Contact: elhanati@lpt.ens.fr or tmora@lps.ens.fr or awalczak@lpt.ens.fr
format Online
Article
Text
id pubmed-4920122
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49201222016-06-27 repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data Elhanati, Yuval Marcou, Quentin Mora, Thierry Walczak, Aleksandra M. Bioinformatics Original Papers Motivation: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events—choices of gene templates, base pair deletions and insertions—described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. Results: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum–Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be [Formula: see text] for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. Availability and implementation: Source code and sample sequence files are available at https://bitbucket.org/yuvalel/repgenhmm/downloads. Contact: elhanati@lpt.ens.fr or tmora@lps.ens.fr or awalczak@lpt.ens.fr Oxford University Press 2016-07-01 2016-03-07 /pmc/articles/PMC4920122/ /pubmed/27153709 http://dx.doi.org/10.1093/bioinformatics/btw112 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Elhanati, Yuval
Marcou, Quentin
Mora, Thierry
Walczak, Aleksandra M.
repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title_full repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title_fullStr repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title_full_unstemmed repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title_short repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
title_sort repgenhmm: a dynamic programming tool to infer the rules of immune receptor generation from sequence data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4920122/
https://www.ncbi.nlm.nih.gov/pubmed/27153709
http://dx.doi.org/10.1093/bioinformatics/btw112
work_keys_str_mv AT elhanatiyuval repgenhmmadynamicprogrammingtooltoinfertherulesofimmunereceptorgenerationfromsequencedata
AT marcouquentin repgenhmmadynamicprogrammingtooltoinfertherulesofimmunereceptorgenerationfromsequencedata
AT morathierry repgenhmmadynamicprogrammingtooltoinfertherulesofimmunereceptorgenerationfromsequencedata
AT walczakaleksandram repgenhmmadynamicprogrammingtooltoinfertherulesofimmunereceptorgenerationfromsequencedata