Cargando…

Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data

During adaptive immune responses, activated B cells expand and undergo somatic hypermutation of their B cell receptor (BCR), forming a clone of diversified cells that can be related back to a common ancestor. Identification of B cell clones from high-throughput Adaptive Immune Receptor Repertoire se...

Descripción completa

Detalles Bibliográficos
Autores principales: Nouri, Nima, Kleinstein, Steven H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070604/
https://www.ncbi.nlm.nih.gov/pubmed/30093903
http://dx.doi.org/10.3389/fimmu.2018.01687
_version_ 1783343701393670144
author Nouri, Nima
Kleinstein, Steven H.
author_facet Nouri, Nima
Kleinstein, Steven H.
author_sort Nouri, Nima
collection PubMed
description During adaptive immune responses, activated B cells expand and undergo somatic hypermutation of their B cell receptor (BCR), forming a clone of diversified cells that can be related back to a common ancestor. Identification of B cell clones from high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data relies on computational analysis. Recently, we proposed an automated method to partition sequences into clonal groups based on single-linkage hierarchical clustering of the BCR junction region with length-normalized Hamming distance metric. This method could identify clonal sequences with high confidence on several benchmark experimental and simulated data sets. However, determining the threshold to cut the hierarchy, a key step in the method, is computationally expensive for large-scale repertoire sequencing data sets. Moreover, the methodology was unable to provide estimates of accuracy for new data. Here, a new method is presented that addresses this computational bottleneck and also provides a study-specific estimation of performance, including sensitivity and specificity. The method uses a finite mixture model fitting procedure for learning the parameters of two univariate curves which fit the bimodal distribution of the distance vector between pairs of sequences. These distributions are used to estimate the performance of different threshold choices for partitioning sequences into clones. These performance estimates are validated using simulated and experimental data sets. With this method, clones can be identified from AIRR-seq data with sensitivity and specificity profiles that are user-defined based on the overall goals of the study.
format Online
Article
Text
id pubmed-6070604
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-60706042018-08-09 Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data Nouri, Nima Kleinstein, Steven H. Front Immunol Immunology During adaptive immune responses, activated B cells expand and undergo somatic hypermutation of their B cell receptor (BCR), forming a clone of diversified cells that can be related back to a common ancestor. Identification of B cell clones from high-throughput Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) data relies on computational analysis. Recently, we proposed an automated method to partition sequences into clonal groups based on single-linkage hierarchical clustering of the BCR junction region with length-normalized Hamming distance metric. This method could identify clonal sequences with high confidence on several benchmark experimental and simulated data sets. However, determining the threshold to cut the hierarchy, a key step in the method, is computationally expensive for large-scale repertoire sequencing data sets. Moreover, the methodology was unable to provide estimates of accuracy for new data. Here, a new method is presented that addresses this computational bottleneck and also provides a study-specific estimation of performance, including sensitivity and specificity. The method uses a finite mixture model fitting procedure for learning the parameters of two univariate curves which fit the bimodal distribution of the distance vector between pairs of sequences. These distributions are used to estimate the performance of different threshold choices for partitioning sequences into clones. These performance estimates are validated using simulated and experimental data sets. With this method, clones can be identified from AIRR-seq data with sensitivity and specificity profiles that are user-defined based on the overall goals of the study. Frontiers Media S.A. 2018-07-26 /pmc/articles/PMC6070604/ /pubmed/30093903 http://dx.doi.org/10.3389/fimmu.2018.01687 Text en Copyright © 2018 Nouri and Kleinstein. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Immunology
Nouri, Nima
Kleinstein, Steven H.
Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title_full Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title_fullStr Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title_full_unstemmed Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title_short Optimized Threshold Inference for Partitioning of Clones From High-Throughput B Cell Repertoire Sequencing Data
title_sort optimized threshold inference for partitioning of clones from high-throughput b cell repertoire sequencing data
topic Immunology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6070604/
https://www.ncbi.nlm.nih.gov/pubmed/30093903
http://dx.doi.org/10.3389/fimmu.2018.01687
work_keys_str_mv AT nourinima optimizedthresholdinferenceforpartitioningofclonesfromhighthroughputbcellrepertoiresequencingdata
AT kleinsteinstevenh optimizedthresholdinferenceforpartitioningofclonesfromhighthroughputbcellrepertoiresequencingdata