Cargando…

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching

MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited...

Descripción completa

Detalles Bibliográficos
Autores principales: Rognes, Torbjørn, Scheffer, Lonneke, Greiff, Victor, Sandve, Geir Kjetil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438946/
https://www.ncbi.nlm.nih.gov/pubmed/35852318
http://dx.doi.org/10.1093/bioinformatics/btac505
_version_ 1784781940584349696
author Rognes, Torbjørn
Scheffer, Lonneke
Greiff, Victor
Sandve, Geir Kjetil
author_facet Rognes, Torbjørn
Scheffer, Lonneke
Greiff, Victor
Sandve, Geir Kjetil
author_sort Rognes, Torbjørn
collection PubMed
description MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. RESULTS: CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 10(4) AIRRs with 10(5) sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications. AVAILABILITY AND IMPLEMENTATION: CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9438946
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94389462022-09-06 CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching Rognes, Torbjørn Scheffer, Lonneke Greiff, Victor Sandve, Geir Kjetil Bioinformatics Applications Note MOTIVATION: Adaptive immune receptor (AIR) repertoires (AIRRs) record past immune encounters with exquisite specificity. Therefore, identifying identical or similar AIR sequences across individuals is a key step in AIRR analysis for revealing convergent immune response patterns that may be exploited for diagnostics and therapy. Existing methods for quantifying AIRR overlap scale poorly with increasing dataset numbers and sizes. To address this limitation, we developed CompAIRR, which enables ultra-fast computation of AIRR overlap, based on either exact or approximate sequence matching. RESULTS: CompAIRR improves computational speed 1000-fold relative to the state of the art and uses only one-third of the memory: on the same machine, the exact pairwise AIRR overlap of 10(4) AIRRs with 10(5) sequences is found in ∼17 min, while the fastest alternative tool requires 10 days. CompAIRR has been integrated with the machine learning ecosystem immuneML to speed up commonly used AIRR-based machine learning applications. AVAILABILITY AND IMPLEMENTATION: CompAIRR code and documentation are available at https://github.com/uio-bmi/compairr. Docker images are available at https://hub.docker.com/r/torognes/compairr. The code to replicate the synthetic datasets, scripts for benchmarking and creating figures, and all raw data underlying the figures are available at https://github.com/uio-bmi/compairr-benchmarking. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-19 /pmc/articles/PMC9438946/ /pubmed/35852318 http://dx.doi.org/10.1093/bioinformatics/btac505 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Rognes, Torbjørn
Scheffer, Lonneke
Greiff, Victor
Sandve, Geir Kjetil
CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title_full CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title_fullStr CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title_full_unstemmed CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title_short CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
title_sort compairr: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9438946/
https://www.ncbi.nlm.nih.gov/pubmed/35852318
http://dx.doi.org/10.1093/bioinformatics/btac505
work_keys_str_mv AT rognestorbjørn compairrultrafastcomparisonofadaptiveimmunereceptorrepertoiresbyexactandapproximatesequencematching
AT schefferlonneke compairrultrafastcomparisonofadaptiveimmunereceptorrepertoiresbyexactandapproximatesequencematching
AT greiffvictor compairrultrafastcomparisonofadaptiveimmunereceptorrepertoiresbyexactandapproximatesequencematching
AT sandvegeirkjetil compairrultrafastcomparisonofadaptiveimmunereceptorrepertoiresbyexactandapproximatesequencematching