Cargando…

Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences

SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the co...

Descripción completa

Detalles Bibliográficos
Autores principales: Smakaj, Erand, Babrak, Lmar, Ohlin, Mats, Shugay, Mikhail, Briney, Bryan, Tosoni, Deniz, Galli, Christopher, Grobelsek, Vendi, D’Angelo, Igor, Olson, Branden, Reddy, Sai, Greiff, Victor, Trück, Johannes, Marquez, Susanna, Lees, William, Miho, Enkelejda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075533/
https://www.ncbi.nlm.nih.gov/pubmed/31873728
http://dx.doi.org/10.1093/bioinformatics/btz845
_version_ 1783507053039321088
author Smakaj, Erand
Babrak, Lmar
Ohlin, Mats
Shugay, Mikhail
Briney, Bryan
Tosoni, Deniz
Galli, Christopher
Grobelsek, Vendi
D’Angelo, Igor
Olson, Branden
Reddy, Sai
Greiff, Victor
Trück, Johannes
Marquez, Susanna
Lees, William
Miho, Enkelejda
author_facet Smakaj, Erand
Babrak, Lmar
Ohlin, Mats
Shugay, Mikhail
Briney, Bryan
Tosoni, Deniz
Galli, Christopher
Grobelsek, Vendi
D’Angelo, Igor
Olson, Branden
Reddy, Sai
Greiff, Victor
Trück, Johannes
Marquez, Susanna
Lees, William
Miho, Enkelejda
author_sort Smakaj, Erand
collection PubMed
description SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7075533
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70755332020-03-19 Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences Smakaj, Erand Babrak, Lmar Ohlin, Mats Shugay, Mikhail Briney, Bryan Tosoni, Deniz Galli, Christopher Grobelsek, Vendi D’Angelo, Igor Olson, Branden Reddy, Sai Greiff, Victor Trück, Johannes Marquez, Susanna Lees, William Miho, Enkelejda Bioinformatics Original Papers SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-03-15 2019-12-24 /pmc/articles/PMC7075533/ /pubmed/31873728 http://dx.doi.org/10.1093/bioinformatics/btz845 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Smakaj, Erand
Babrak, Lmar
Ohlin, Mats
Shugay, Mikhail
Briney, Bryan
Tosoni, Deniz
Galli, Christopher
Grobelsek, Vendi
D’Angelo, Igor
Olson, Branden
Reddy, Sai
Greiff, Victor
Trück, Johannes
Marquez, Susanna
Lees, William
Miho, Enkelejda
Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title_full Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title_fullStr Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title_full_unstemmed Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title_short Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
title_sort benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075533/
https://www.ncbi.nlm.nih.gov/pubmed/31873728
http://dx.doi.org/10.1093/bioinformatics/btz845
work_keys_str_mv AT smakajerand benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT babraklmar benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT ohlinmats benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT shugaymikhail benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT brineybryan benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT tosonideniz benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT gallichristopher benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT grobelsekvendi benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT dangeloigor benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT olsonbranden benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT reddysai benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT greiffvictor benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT truckjohannes benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT marquezsusanna benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT leeswilliam benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences
AT mihoenkelejda benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences