Cargando…

Deep generative models for T cell receptor protein sequences

Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Davidsen, Kristian, Olson, Branden J, DeWitt, William S, Feng, Jean, Harkins, Elias, Bradley, Philip, Matsen, Frederick A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6728137/
https://www.ncbi.nlm.nih.gov/pubmed/31487240
http://dx.doi.org/10.7554/eLife.46935
_version_ 1783449386700767232
author Davidsen, Kristian
Olson, Branden J
DeWitt, William S
Feng, Jean
Harkins, Elias
Bradley, Philip
Matsen, Frederick A
author_facet Davidsen, Kristian
Olson, Branden J
DeWitt, William S
Feng, Jean
Harkins, Elias
Bradley, Philip
Matsen, Frederick A
author_sort Davidsen, Kristian
collection PubMed
description Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences.
format Online
Article
Text
id pubmed-6728137
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-67281372019-09-10 Deep generative models for T cell receptor protein sequences Davidsen, Kristian Olson, Branden J DeWitt, William S Feng, Jean Harkins, Elias Bradley, Philip Matsen, Frederick A eLife Computational and Systems Biology Probabilistic models of adaptive immune repertoire sequence distributions can be used to infer the expansion of immune cells in response to stimulus, differentiate genetic from environmental factors that determine repertoire sharing, and evaluate the suitability of various target immune sequences for stimulation via vaccination. Classically, these models are defined in terms of a probabilistic V(D)J recombination model which is sometimes combined with a selection model. In this paper we take a different approach, fitting variational autoencoder (VAE) models parameterized by deep neural networks to T cell receptor (TCR) repertoires. We show that simple VAE models can perform accurate cohort frequency estimation, learn the rules of VDJ recombination, and generalize well to unseen sequences. Further, we demonstrate that VAE-like models can distinguish between real sequences and sequences generated according to a recombination-selection model, and that many characteristics of VAE-generated sequences are similar to those of real sequences. eLife Sciences Publications, Ltd 2019-09-05 /pmc/articles/PMC6728137/ /pubmed/31487240 http://dx.doi.org/10.7554/eLife.46935 Text en © 2019, Davidsen et al http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Davidsen, Kristian
Olson, Branden J
DeWitt, William S
Feng, Jean
Harkins, Elias
Bradley, Philip
Matsen, Frederick A
Deep generative models for T cell receptor protein sequences
title Deep generative models for T cell receptor protein sequences
title_full Deep generative models for T cell receptor protein sequences
title_fullStr Deep generative models for T cell receptor protein sequences
title_full_unstemmed Deep generative models for T cell receptor protein sequences
title_short Deep generative models for T cell receptor protein sequences
title_sort deep generative models for t cell receptor protein sequences
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6728137/
https://www.ncbi.nlm.nih.gov/pubmed/31487240
http://dx.doi.org/10.7554/eLife.46935
work_keys_str_mv AT davidsenkristian deepgenerativemodelsfortcellreceptorproteinsequences
AT olsonbrandenj deepgenerativemodelsfortcellreceptorproteinsequences
AT dewittwilliams deepgenerativemodelsfortcellreceptorproteinsequences
AT fengjean deepgenerativemodelsfortcellreceptorproteinsequences
AT harkinselias deepgenerativemodelsfortcellreceptorproteinsequences
AT bradleyphilip deepgenerativemodelsfortcellreceptorproteinsequences
AT matsenfredericka deepgenerativemodelsfortcellreceptorproteinsequences