Cargando…

Clustering-based identification of clonally-related immunoglobulin gene sequence sets

BACKGROUND: Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhiliang, Collins, Andrew M, Wang, Yan, Gaëta, Bruno A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2946782/
https://www.ncbi.nlm.nih.gov/pubmed/20875155
http://dx.doi.org/10.1186/1745-7580-6-S1-S4
_version_ 1782187338505388032
author Chen, Zhiliang
Collins, Andrew M
Wang, Yan
Gaëta, Bruno A
author_facet Chen, Zhiliang
Collins, Andrew M
Wang, Yan
Gaëta, Bruno A
author_sort Chen, Zhiliang
collection PubMed
description BACKGROUND: Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences. RESULTS: We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from http://www.cse.unsw.edu.au/~ihmmune/ClonalRelate/ClonalRelate.zip. CONCLUSIONS: The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts.
format Text
id pubmed-2946782
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29467822010-09-29 Clustering-based identification of clonally-related immunoglobulin gene sequence sets Chen, Zhiliang Collins, Andrew M Wang, Yan Gaëta, Bruno A Immunome Res Proceedings BACKGROUND: Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences. RESULTS: We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from http://www.cse.unsw.edu.au/~ihmmune/ClonalRelate/ClonalRelate.zip. CONCLUSIONS: The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts. BioMed Central 2010-09-27 /pmc/articles/PMC2946782/ /pubmed/20875155 http://dx.doi.org/10.1186/1745-7580-6-S1-S4 Text en Copyright ©2010 Gaëta et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Chen, Zhiliang
Collins, Andrew M
Wang, Yan
Gaëta, Bruno A
Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title_full Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title_fullStr Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title_full_unstemmed Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title_short Clustering-based identification of clonally-related immunoglobulin gene sequence sets
title_sort clustering-based identification of clonally-related immunoglobulin gene sequence sets
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2946782/
https://www.ncbi.nlm.nih.gov/pubmed/20875155
http://dx.doi.org/10.1186/1745-7580-6-S1-S4
work_keys_str_mv AT chenzhiliang clusteringbasedidentificationofclonallyrelatedimmunoglobulingenesequencesets
AT collinsandrewm clusteringbasedidentificationofclonallyrelatedimmunoglobulingenesequencesets
AT wangyan clusteringbasedidentificationofclonallyrelatedimmunoglobulingenesequencesets
AT gaetabrunoa clusteringbasedidentificationofclonallyrelatedimmunoglobulingenesequencesets