Cargando…
Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the co...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075533/ https://www.ncbi.nlm.nih.gov/pubmed/31873728 http://dx.doi.org/10.1093/bioinformatics/btz845 |
_version_ | 1783507053039321088 |
---|---|
author | Smakaj, Erand Babrak, Lmar Ohlin, Mats Shugay, Mikhail Briney, Bryan Tosoni, Deniz Galli, Christopher Grobelsek, Vendi D’Angelo, Igor Olson, Branden Reddy, Sai Greiff, Victor Trück, Johannes Marquez, Susanna Lees, William Miho, Enkelejda |
author_facet | Smakaj, Erand Babrak, Lmar Ohlin, Mats Shugay, Mikhail Briney, Bryan Tosoni, Deniz Galli, Christopher Grobelsek, Vendi D’Angelo, Igor Olson, Branden Reddy, Sai Greiff, Victor Trück, Johannes Marquez, Susanna Lees, William Miho, Enkelejda |
author_sort | Smakaj, Erand |
collection | PubMed |
description | SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7075533 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-70755332020-03-19 Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences Smakaj, Erand Babrak, Lmar Ohlin, Mats Shugay, Mikhail Briney, Bryan Tosoni, Deniz Galli, Christopher Grobelsek, Vendi D’Angelo, Igor Olson, Branden Reddy, Sai Greiff, Victor Trück, Johannes Marquez, Susanna Lees, William Miho, Enkelejda Bioinformatics Original Papers SUMMARY: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets. We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms. AVAILABILITY AND IMPLEMENTATION: All tools utilized in the paper are free for academic use. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-03-15 2019-12-24 /pmc/articles/PMC7075533/ /pubmed/31873728 http://dx.doi.org/10.1093/bioinformatics/btz845 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Smakaj, Erand Babrak, Lmar Ohlin, Mats Shugay, Mikhail Briney, Bryan Tosoni, Deniz Galli, Christopher Grobelsek, Vendi D’Angelo, Igor Olson, Branden Reddy, Sai Greiff, Victor Trück, Johannes Marquez, Susanna Lees, William Miho, Enkelejda Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title | Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title_full | Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title_fullStr | Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title_full_unstemmed | Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title_short | Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
title_sort | benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7075533/ https://www.ncbi.nlm.nih.gov/pubmed/31873728 http://dx.doi.org/10.1093/bioinformatics/btz845 |
work_keys_str_mv | AT smakajerand benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT babraklmar benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT ohlinmats benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT shugaymikhail benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT brineybryan benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT tosonideniz benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT gallichristopher benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT grobelsekvendi benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT dangeloigor benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT olsonbranden benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT reddysai benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT greiffvictor benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT truckjohannes benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT marquezsusanna benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT leeswilliam benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences AT mihoenkelejda benchmarkingimmunoinformatictoolsfortheanalysisofantibodyrepertoiresequences |