Cargando…

fastp: an ultra-fast all-in-one FASTQ preprocessor

MOTIVATION: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most a...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Shifu, Zhou, Yanqing, Chen, Yaru, Gu, Jia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129281/
https://www.ncbi.nlm.nih.gov/pubmed/30423086
http://dx.doi.org/10.1093/bioinformatics/bty560
_version_ 1783353773143359488
author Chen, Shifu
Zhou, Yanqing
Chen, Yaru
Gu, Jia
author_facet Chen, Shifu
Zhou, Yanqing
Chen, Yaru
Gu, Jia
author_sort Chen, Shifu
collection PubMed
description MOTIVATION: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. RESULTS: We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. AVAILABILITY AND IMPLEMENTATION: The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
format Online
Article
Text
id pubmed-6129281
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61292812018-09-12 fastp: an ultra-fast all-in-one FASTQ preprocessor Chen, Shifu Zhou, Yanqing Chen, Yaru Gu, Jia Bioinformatics Eccb 2018: European Conference on Computational Biology Proceedings MOTIVATION: Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. RESULTS: We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. AVAILABILITY AND IMPLEMENTATION: The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp. Oxford University Press 2018-09-01 2018-09-08 /pmc/articles/PMC6129281/ /pubmed/30423086 http://dx.doi.org/10.1093/bioinformatics/bty560 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Eccb 2018: European Conference on Computational Biology Proceedings
Chen, Shifu
Zhou, Yanqing
Chen, Yaru
Gu, Jia
fastp: an ultra-fast all-in-one FASTQ preprocessor
title fastp: an ultra-fast all-in-one FASTQ preprocessor
title_full fastp: an ultra-fast all-in-one FASTQ preprocessor
title_fullStr fastp: an ultra-fast all-in-one FASTQ preprocessor
title_full_unstemmed fastp: an ultra-fast all-in-one FASTQ preprocessor
title_short fastp: an ultra-fast all-in-one FASTQ preprocessor
title_sort fastp: an ultra-fast all-in-one fastq preprocessor
topic Eccb 2018: European Conference on Computational Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129281/
https://www.ncbi.nlm.nih.gov/pubmed/30423086
http://dx.doi.org/10.1093/bioinformatics/bty560
work_keys_str_mv AT chenshifu fastpanultrafastallinonefastqpreprocessor
AT zhouyanqing fastpanultrafastallinonefastqpreprocessor
AT chenyaru fastpanultrafastallinonefastqpreprocessor
AT gujia fastpanultrafastallinonefastqpreprocessor