Cargando…

SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files

Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software...

Descripción completa

Detalles Bibliográficos
Autores principales: Telatin, Andrea, Fariselli, Piero, Birolo, Giovanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8148589/
https://www.ncbi.nlm.nih.gov/pubmed/34066939
http://dx.doi.org/10.3390/bioengineering8050059
_version_ 1783697874751586304
author Telatin, Andrea
Fariselli, Piero
Birolo, Giovanni
author_facet Telatin, Andrea
Fariselli, Piero
Birolo, Giovanni
author_sort Telatin, Andrea
collection PubMed
description Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.
format Online
Article
Text
id pubmed-8148589
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81485892021-05-26 SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files Telatin, Andrea Fariselli, Piero Birolo, Giovanni Bioengineering (Basel) Communication Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda. MDPI 2021-05-07 /pmc/articles/PMC8148589/ /pubmed/34066939 http://dx.doi.org/10.3390/bioengineering8050059 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Communication
Telatin, Andrea
Fariselli, Piero
Birolo, Giovanni
SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_full SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_fullStr SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_full_unstemmed SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_short SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files
title_sort seqfu: a suite of utilities for the robust and reproducible manipulation of sequence files
topic Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8148589/
https://www.ncbi.nlm.nih.gov/pubmed/34066939
http://dx.doi.org/10.3390/bioengineering8050059
work_keys_str_mv AT telatinandrea seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles
AT farisellipiero seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles
AT birologiovanni seqfuasuiteofutilitiesfortherobustandreproduciblemanipulationofsequencefiles