Cargando…

Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data

The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (...

Descripción completa

Detalles Bibliográficos
Autores principales: Gadala-Maria, Daniel, Gidoni, Moriah, Marquez, Susanna, Vander Heiden, Jason A., Kos, Justin T., Watson, Corey T., O'Connor, Kevin C., Yaari, Gur, Kleinstein, Steven H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6381938/
https://www.ncbi.nlm.nih.gov/pubmed/30814994
http://dx.doi.org/10.3389/fimmu.2019.00129
_version_ 1783396592677552128
author Gadala-Maria, Daniel
Gidoni, Moriah
Marquez, Susanna
Vander Heiden, Jason A.
Kos, Justin T.
Watson, Corey T.
O'Connor, Kevin C.
Yaari, Gur
Kleinstein, Steven H.
author_facet Gadala-Maria, Daniel
Gidoni, Moriah
Marquez, Susanna
Vander Heiden, Jason A.
Kos, Justin T.
Watson, Corey T.
O'Connor, Kevin C.
Yaari, Gur
Kleinstein, Steven H.
author_sort Gadala-Maria, Daniel
collection PubMed
description The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D), and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorithm which can detect alleles that differ by any number of SNPs from the nearest database allele, and can construct subject-specific genotypes with minimal prior information. TIgGER predictions are validated both computationally (using a leave-one-out strategy) and experimentally (using genomic sequencing), resulting in the addition of three new immunoglobulin heavy chain V (IGHV) gene alleles to the IMGT repertoire. Finally, we develop a Bayesian strategy to provide a confidence estimate associated with genotype calls. All together, these methods allow for much higher accuracy in germline allele assignment, an essential step in AIRR-seq studies.
format Online
Article
Text
id pubmed-6381938
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-63819382019-02-27 Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data Gadala-Maria, Daniel Gidoni, Moriah Marquez, Susanna Vander Heiden, Jason A. Kos, Justin T. Watson, Corey T. O'Connor, Kevin C. Yaari, Gur Kleinstein, Steven H. Front Immunol Immunology The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D), and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorithm which can detect alleles that differ by any number of SNPs from the nearest database allele, and can construct subject-specific genotypes with minimal prior information. TIgGER predictions are validated both computationally (using a leave-one-out strategy) and experimentally (using genomic sequencing), resulting in the addition of three new immunoglobulin heavy chain V (IGHV) gene alleles to the IMGT repertoire. Finally, we develop a Bayesian strategy to provide a confidence estimate associated with genotype calls. All together, these methods allow for much higher accuracy in germline allele assignment, an essential step in AIRR-seq studies. Frontiers Media S.A. 2019-02-13 /pmc/articles/PMC6381938/ /pubmed/30814994 http://dx.doi.org/10.3389/fimmu.2019.00129 Text en Copyright © 2019 Gadala-Maria, Gidoni, Marquez, Vander Heiden, Kos, Watson, O'Connor, Yaari and Kleinstein. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Gadala-Maria, Daniel
Gidoni, Moriah
Marquez, Susanna
Vander Heiden, Jason A.
Kos, Justin T.
Watson, Corey T.
O'Connor, Kevin C.
Yaari, Gur
Kleinstein, Steven H.
Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title_full Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title_fullStr Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title_full_unstemmed Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title_short Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data
title_sort identification of subject-specific immunoglobulin alleles from expressed repertoire sequencing data
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6381938/
https://www.ncbi.nlm.nih.gov/pubmed/30814994
http://dx.doi.org/10.3389/fimmu.2019.00129
work_keys_str_mv AT gadalamariadaniel identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT gidonimoriah identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT marquezsusanna identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT vanderheidenjasona identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT kosjustint identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT watsoncoreyt identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT oconnorkevinc identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT yaarigur identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata
AT kleinsteinstevenh identificationofsubjectspecificimmunoglobulinallelesfromexpressedrepertoiresequencingdata