Cargando…

Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires

MOTIVATION: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compa...

Descripción completa

Detalles Bibliográficos
Autores principales: Cinelli, Mattia, Sun, , Yuxin, Best, Katharine, Heather, James M, Reich-Zeliger, Shlomit, Shifrut, Eric, Friedman, Nir, Shawe-Taylor, John, Chain, Benny
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860388/
https://www.ncbi.nlm.nih.gov/pubmed/28073756
http://dx.doi.org/10.1093/bioinformatics/btw771
_version_ 1783307972216094720
author Cinelli, Mattia
Sun, , Yuxin
Best, Katharine
Heather, James M
Reich-Zeliger, Shlomit
Shifrut, Eric
Friedman, Nir
Shawe-Taylor, John
Chain, Benny
author_facet Cinelli, Mattia
Sun, , Yuxin
Best, Katharine
Heather, James M
Reich-Zeliger, Shlomit
Shifrut, Eric
Friedman, Nir
Shawe-Taylor, John
Chain, Benny
author_sort Cinelli, Mattia
collection PubMed
description MOTIVATION: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. RESULTS: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. SUMMARY: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. AVAILABILITY AND IMPLEMENTATION: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5860388
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58603882018-03-28 Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires Cinelli, Mattia Sun, , Yuxin Best, Katharine Heather, James M Reich-Zeliger, Shlomit Shifrut, Eric Friedman, Nir Shawe-Taylor, John Chain, Benny Bioinformatics Discovery Note MOTIVATION: Somatic DNA recombination, the hallmark of vertebrate adaptive immunity, has the potential to generate a vast diversity of antigen receptor sequences. How this diversity captures antigen specificity remains incompletely understood. In this study we use high throughput sequencing to compare the global changes in T cell receptor β chain complementarity determining region 3 (CDR3β) sequences following immunization with ovalbumin administered with complete Freund’s adjuvant (CFA) or CFA alone. RESULTS: The CDR3β sequences were deconstructed into short stretches of overlapping contiguous amino acids. The motifs were ranked according to a one-dimensional Bayesian classifier score comparing their frequency in the repertoires of the two immunization classes. The top ranking motifs were selected and used to create feature vectors which were used to train a support vector machine. The support vector machine achieved high classification scores in a leave-one-out validation test reaching >90% in some cases. SUMMARY: The study describes a novel two-stage classification strategy combining a one-dimensional Bayesian classifier with a support vector machine. Using this approach we demonstrate that the frequency of a small number of linear motifs three amino acids in length can accurately identify a CD4 T cell response to ovalbumin against a background response to the complex mixture of antigens which characterize Complete Freund’s Adjuvant. AVAILABILITY AND IMPLEMENTATION: The sequence data is available at www.ncbi.nlm.nih.gov/sra/?term¼SRP075893. The Decombinator package is available at github.com/innate2adaptive/Decombinator. The R package e1071 is available at the CRAN repository https://cran.r-project.org/web/packages/e1071/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-04-01 2017-01-05 /pmc/articles/PMC5860388/ /pubmed/28073756 http://dx.doi.org/10.1093/bioinformatics/btw771 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Discovery Note
Cinelli, Mattia
Sun, , Yuxin
Best, Katharine
Heather, James M
Reich-Zeliger, Shlomit
Shifrut, Eric
Friedman, Nir
Shawe-Taylor, John
Chain, Benny
Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title_full Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title_fullStr Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title_full_unstemmed Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title_short Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires
title_sort feature selection using a one dimensional naïve bayes’ classifier increases the accuracy of support vector machine classification of cdr3 repertoires
topic Discovery Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860388/
https://www.ncbi.nlm.nih.gov/pubmed/28073756
http://dx.doi.org/10.1093/bioinformatics/btw771
work_keys_str_mv AT cinellimattia featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT sunyuxin featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT bestkatharine featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT heatherjamesm featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT reichzeligershlomit featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT shifruteric featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT friedmannir featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT shawetaylorjohn featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires
AT chainbenny featureselectionusingaonedimensionalnaivebayesclassifierincreasestheaccuracyofsupportvectormachineclassificationofcdr3repertoires