Cargando…

A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data

MOTIVATION: B cells derive their antigen-specificity through the expression of Immunoglobulin (Ig) receptors on their surface. These receptors are initially generated stochastically by somatic re-arrangement of the DNA and further diversified following antigen-activation by a process of somatic hype...

Descripción completa

Detalles Bibliográficos
Autores principales: Nouri, Nima, Kleinstein, Steven H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022594/
https://www.ncbi.nlm.nih.gov/pubmed/29949968
http://dx.doi.org/10.1093/bioinformatics/bty235
_version_ 1783335711553880064
author Nouri, Nima
Kleinstein, Steven H
author_facet Nouri, Nima
Kleinstein, Steven H
author_sort Nouri, Nima
collection PubMed
description MOTIVATION: B cells derive their antigen-specificity through the expression of Immunoglobulin (Ig) receptors on their surface. These receptors are initially generated stochastically by somatic re-arrangement of the DNA and further diversified following antigen-activation by a process of somatic hypermutation, which introduces mainly point substitutions into the receptor DNA at a high rate. Recent advances in next-generation sequencing have enabled large-scale profiling of the B cell Ig repertoire from blood and tissue samples. A key computational challenge in the analysis of these data is partitioning the sequences to identify descendants of a common B cell (i.e. a clone). Current methods group sequences using a fixed distance threshold, or a likelihood calculation that is computationally-intensive. Here, we propose a new method based on spectral clustering with an adaptive threshold to determine the local sequence neighborhood. Validation using simulated and experimental datasets demonstrates that this method has high sensitivity and specificity compared to a fixed threshold that is optimized for these measures. In addition, this method works on datasets where choosing an optimal fixed threshold is difficult and is more computationally efficient in all cases. The ability to quickly and accurately identify members of a clone from repertoire sequencing data will greatly improve downstream analyses. Clonally-related sequences cannot be treated independently in statistical models, and clonal partitions are used as the basis for the calculation of diversity metrics, lineage reconstruction and selection analysis. Thus, the spectral clustering-based method here represents an important contribution to repertoire analysis. AVAILABILITY AND IMPLEMENTATION: Source code for this method is freely available in the SCOPe (Spectral Clustering for clOne Partitioning) R package in the Immcantation framework: www.immcantation.org under the CC BY-SA 4.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022594
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225942018-07-10 A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data Nouri, Nima Kleinstein, Steven H Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: B cells derive their antigen-specificity through the expression of Immunoglobulin (Ig) receptors on their surface. These receptors are initially generated stochastically by somatic re-arrangement of the DNA and further diversified following antigen-activation by a process of somatic hypermutation, which introduces mainly point substitutions into the receptor DNA at a high rate. Recent advances in next-generation sequencing have enabled large-scale profiling of the B cell Ig repertoire from blood and tissue samples. A key computational challenge in the analysis of these data is partitioning the sequences to identify descendants of a common B cell (i.e. a clone). Current methods group sequences using a fixed distance threshold, or a likelihood calculation that is computationally-intensive. Here, we propose a new method based on spectral clustering with an adaptive threshold to determine the local sequence neighborhood. Validation using simulated and experimental datasets demonstrates that this method has high sensitivity and specificity compared to a fixed threshold that is optimized for these measures. In addition, this method works on datasets where choosing an optimal fixed threshold is difficult and is more computationally efficient in all cases. The ability to quickly and accurately identify members of a clone from repertoire sequencing data will greatly improve downstream analyses. Clonally-related sequences cannot be treated independently in statistical models, and clonal partitions are used as the basis for the calculation of diversity metrics, lineage reconstruction and selection analysis. Thus, the spectral clustering-based method here represents an important contribution to repertoire analysis. AVAILABILITY AND IMPLEMENTATION: Source code for this method is freely available in the SCOPe (Spectral Clustering for clOne Partitioning) R package in the Immcantation framework: www.immcantation.org under the CC BY-SA 4.0 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022594/ /pubmed/29949968 http://dx.doi.org/10.1093/bioinformatics/bty235 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Nouri, Nima
Kleinstein, Steven H
A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title_full A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title_fullStr A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title_full_unstemmed A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title_short A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
title_sort spectral clustering-based method for identifying clones from high-throughput b cell repertoire sequencing data
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022594/
https://www.ncbi.nlm.nih.gov/pubmed/29949968
http://dx.doi.org/10.1093/bioinformatics/bty235
work_keys_str_mv AT nourinima aspectralclusteringbasedmethodforidentifyingclonesfromhighthroughputbcellrepertoiresequencingdata
AT kleinsteinstevenh aspectralclusteringbasedmethodforidentifyingclonesfromhighthroughputbcellrepertoiresequencingdata
AT nourinima spectralclusteringbasedmethodforidentifyingclonesfromhighthroughputbcellrepertoiresequencingdata
AT kleinsteinstevenh spectralclusteringbasedmethodforidentifyingclonesfromhighthroughputbcellrepertoiresequencingdata