Cargando…

Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer

Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexib...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilkins, Oscar G, Capitanchik, Charlotte, Luscombe, Nicholas M., Ule, Jernej
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8287537/
https://www.ncbi.nlm.nih.gov/pubmed/34286104
http://dx.doi.org/10.12688/wellcomeopenres.16791.1
_version_ 1783723926909616128
author Wilkins, Oscar G
Capitanchik, Charlotte
Luscombe, Nicholas M.
Ule, Jernej
author_facet Wilkins, Oscar G
Capitanchik, Charlotte
Luscombe, Nicholas M.
Ule, Jernej
author_sort Wilkins, Oscar G
collection PubMed
description Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.
format Online
Article
Text
id pubmed-8287537
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-82875372021-07-19 Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer Wilkins, Oscar G Capitanchik, Charlotte Luscombe, Nicholas M. Ule, Jernej Wellcome Open Res Software Tool Article Background: The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as “demultiplexing”. However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Results: Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5’ and 3’ ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Conclusions: Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github. F1000 Research Limited 2021-06-07 /pmc/articles/PMC8287537/ /pubmed/34286104 http://dx.doi.org/10.12688/wellcomeopenres.16791.1 Text en Copyright: © 2021 Wilkins OG et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Wilkins, Oscar G
Capitanchik, Charlotte
Luscombe, Nicholas M.
Ule, Jernej
Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title_full Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title_fullStr Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title_full_unstemmed Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title_short Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer
title_sort ultraplex: a rapid, flexible, all-in-one fastq demultiplexer
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8287537/
https://www.ncbi.nlm.nih.gov/pubmed/34286104
http://dx.doi.org/10.12688/wellcomeopenres.16791.1
work_keys_str_mv AT wilkinsoscarg ultraplexarapidflexibleallinonefastqdemultiplexer
AT capitanchikcharlotte ultraplexarapidflexibleallinonefastqdemultiplexer
AT luscombenicholasm ultraplexarapidflexibleallinonefastqdemultiplexer
AT ulejernej ultraplexarapidflexibleallinonefastqdemultiplexer