Cargando…

PyIR: a scalable wrapper for processing billions of immunoglobulin and T cell receptor sequences using IgBLAST

BACKGROUND: Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for an...

Descripción completa

Detalles Bibliográficos
Autores principales: Soto, Cinque, Finn, Jessica A., Willis, Jordan R., Day, Samuel B., Sinkovits, Robert S., Jones, Taylor, Schmitz, Samuel, Meiler, Jens, Branchizio, Andre, Crowe, James E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7364545/
https://www.ncbi.nlm.nih.gov/pubmed/32677886
http://dx.doi.org/10.1186/s12859-020-03649-5
Descripción
Sumario:BACKGROUND: Recent advances in DNA sequencing technologies have enabled significant leaps in capacity to generate large volumes of DNA sequence data, which has spurred a rapid growth in the use of bioinformatics as a means of interrogating antibody variable gene repertoires. Common tools used for annotation of antibody sequences are often limited in functionality, modularity and usability. RESULTS: We have developed PyIR, a Python wrapper and library for IgBLAST, which offers a minimal setup CLI and API, FASTQ support, file chunking for large sequence files, JSON and Python dictionary output, and built-in sequence filtering. CONCLUSIONS: PyIR offers improved processing speed over multithreaded IgBLAST (version 1.14) when spawning more than 16 processes on a single computer system. Its customizable filtering and data encapsulation allow it to be adapted to a wide range of computing environments. The API allows for IgBLAST to be used in customized bioinformatics workflows.