Cargando…

VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data

BACKGROUND: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. RE...

Descripción completa

Detalles Bibliográficos
Autores principales: Christley, Scott, Levin, Mikhail K., Toby, Inimary T., Fonner, John M., Monson, Nancy L., Rounds, William H., Rubelt, Florian, Scarborough, Walter, Scheuermann, Richard H., Cowell, Lindsay G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637252/
https://www.ncbi.nlm.nih.gov/pubmed/29020925
http://dx.doi.org/10.1186/s12859-017-1853-z
_version_ 1783270592213942272
author Christley, Scott
Levin, Mikhail K.
Toby, Inimary T.
Fonner, John M.
Monson, Nancy L.
Rounds, William H.
Rubelt, Florian
Scarborough, Walter
Scheuermann, Richard H.
Cowell, Lindsay G.
author_facet Christley, Scott
Levin, Mikhail K.
Toby, Inimary T.
Fonner, John M.
Monson, Nancy L.
Rounds, William H.
Rubelt, Florian
Scarborough, Walter
Scheuermann, Richard H.
Cowell, Lindsay G.
author_sort Christley, Scott
collection PubMed
description BACKGROUND: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. RESULTS: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. CONCLUSIONS: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets.
format Online
Article
Text
id pubmed-5637252
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56372522017-10-18 VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data Christley, Scott Levin, Mikhail K. Toby, Inimary T. Fonner, John M. Monson, Nancy L. Rounds, William H. Rubelt, Florian Scarborough, Walter Scheuermann, Richard H. Cowell, Lindsay G. BMC Bioinformatics Software BACKGROUND: Pre-processing of high-throughput sequencing data for immune repertoire profiling is essential to insure high quality input for downstream analysis. VDJPipe is a flexible, high-performance tool that can perform multiple pre-processing tasks with just a single pass over the data files. RESULTS: Processing tasks provided by VDJPipe include base composition statistics calculation, read quality statistics calculation, quality filtering, homopolymer filtering, length and nucleotide filtering, paired-read merging, barcode demultiplexing, 5′ and 3′ PCR primer matching, and duplicate reads collapsing. VDJPipe utilizes a pipeline approach whereby multiple processing steps are performed in a sequential workflow, with the output of each step passed as input to the next step automatically. The workflow is flexible enough to handle the complex barcoding schemes used in many immunosequencing experiments. Because VDJPipe is designed for computational efficiency, we evaluated this by comparing execution times with those of pRESTO, a widely-used pre-processing tool for immune repertoire sequencing data. We found that VDJPipe requires <10% of the run time required by pRESTO. CONCLUSIONS: VDJPipe is a high-performance tool that is optimized for pre-processing large immune repertoire sequencing data sets. BioMed Central 2017-10-11 /pmc/articles/PMC5637252/ /pubmed/29020925 http://dx.doi.org/10.1186/s12859-017-1853-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Christley, Scott
Levin, Mikhail K.
Toby, Inimary T.
Fonner, John M.
Monson, Nancy L.
Rounds, William H.
Rubelt, Florian
Scarborough, Walter
Scheuermann, Richard H.
Cowell, Lindsay G.
VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_full VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_fullStr VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_full_unstemmed VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_short VDJPipe: a pipelined tool for pre-processing immune repertoire sequencing data
title_sort vdjpipe: a pipelined tool for pre-processing immune repertoire sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5637252/
https://www.ncbi.nlm.nih.gov/pubmed/29020925
http://dx.doi.org/10.1186/s12859-017-1853-z
work_keys_str_mv AT christleyscott vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT levinmikhailk vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT tobyinimaryt vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT fonnerjohnm vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT monsonnancyl vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT roundswilliamh vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT rubeltflorian vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT scarboroughwalter vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT scheuermannrichardh vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata
AT cowelllindsayg vdjpipeapipelinedtoolforpreprocessingimmunerepertoiresequencingdata