Cargando…

2FAST2Q: a general-purpose sequence search and counting program for FASTQ files

BACKGROUND: The increasingly widespread use of next generation sequencing protocols has brought the need for the development of user-friendly raw data processing tools. Here, we explore 2FAST2Q, a versatile and intuitive standalone program capable of extracting and counting feature occurrences in FA...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bravo, Afonso M., Typas, Athanasios, Veening, Jan-Willem
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9615965/ https://www.ncbi.nlm.nih.gov/pubmed/36312750 http://dx.doi.org/10.7717/peerj.14041

_version_	1784820543161106432
author	Bravo, Afonso M. Typas, Athanasios Veening, Jan-Willem
author_facet	Bravo, Afonso M. Typas, Athanasios Veening, Jan-Willem
author_sort	Bravo, Afonso M.
collection	PubMed
description	BACKGROUND: The increasingly widespread use of next generation sequencing protocols has brought the need for the development of user-friendly raw data processing tools. Here, we explore 2FAST2Q, a versatile and intuitive standalone program capable of extracting and counting feature occurrences in FASTQ files. Despite 2FAST2Q being previously described as part of a CRISPRi-seq analysis pipeline, in here we further elaborate on the program’s functionality, and its broader applicability and functions. METHODS: 2FAST2Q is built in Python, with published standalone executables in Windows MS, MacOS, and Linux. It has a familiar user interface, and uses an advanced custom sequence searching algorithm. RESULTS: Using published CRISPRi datasets in which Escherichia coli and Mycobacterium tuberculosis gene essentiality, as well as host-cell sensitivity towards SARS-CoV2 infectivity were tested, we demonstrate that 2FAST2Q efficiently recapitulates published output in read counts per provided feature. We further show that 2FAST2Q can be used in any experimental setup that requires feature extraction from raw reads, being able to quickly handle Hamming distance based mismatch alignments, nucleotide wise Phred score filtering, custom read trimming, and sequence searching within a single program. Moreover, we exemplify how different FASTQ read filtering parameters impact downstream analysis, and suggest a default usage protocol. 2FAST2Q is easier to use and faster than currently available tools, efficiently processing not only CRISPRi-seq / random-barcode sequencing datasets on any up-to-date laptop, but also handling the advanced extraction of de novo features from FASTQ files. We expect that 2FAST2Q will not only be useful for people working in microbiology but also for other fields in which amplicon sequencing data is generated. 2FAST2Q is available as an executable file for all current operating systems without installation and as a Python3 module on the PyPI repository (available at https://veeninglab.com/2fast2q).
format	Online Article Text
id	pubmed-9615965
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-96159652022-10-29 2FAST2Q: a general-purpose sequence search and counting program for FASTQ files Bravo, Afonso M. Typas, Athanasios Veening, Jan-Willem PeerJ Bioinformatics BACKGROUND: The increasingly widespread use of next generation sequencing protocols has brought the need for the development of user-friendly raw data processing tools. Here, we explore 2FAST2Q, a versatile and intuitive standalone program capable of extracting and counting feature occurrences in FASTQ files. Despite 2FAST2Q being previously described as part of a CRISPRi-seq analysis pipeline, in here we further elaborate on the program’s functionality, and its broader applicability and functions. METHODS: 2FAST2Q is built in Python, with published standalone executables in Windows MS, MacOS, and Linux. It has a familiar user interface, and uses an advanced custom sequence searching algorithm. RESULTS: Using published CRISPRi datasets in which Escherichia coli and Mycobacterium tuberculosis gene essentiality, as well as host-cell sensitivity towards SARS-CoV2 infectivity were tested, we demonstrate that 2FAST2Q efficiently recapitulates published output in read counts per provided feature. We further show that 2FAST2Q can be used in any experimental setup that requires feature extraction from raw reads, being able to quickly handle Hamming distance based mismatch alignments, nucleotide wise Phred score filtering, custom read trimming, and sequence searching within a single program. Moreover, we exemplify how different FASTQ read filtering parameters impact downstream analysis, and suggest a default usage protocol. 2FAST2Q is easier to use and faster than currently available tools, efficiently processing not only CRISPRi-seq / random-barcode sequencing datasets on any up-to-date laptop, but also handling the advanced extraction of de novo features from FASTQ files. We expect that 2FAST2Q will not only be useful for people working in microbiology but also for other fields in which amplicon sequencing data is generated. 2FAST2Q is available as an executable file for all current operating systems without installation and as a Python3 module on the PyPI repository (available at https://veeninglab.com/2fast2q). PeerJ Inc. 2022-10-25 /pmc/articles/PMC9615965/ /pubmed/36312750 http://dx.doi.org/10.7717/peerj.14041 Text en ©2022 Bravo et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Bravo, Afonso M. Typas, Athanasios Veening, Jan-Willem 2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title	2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title_full	2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title_fullStr	2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title_full_unstemmed	2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title_short	2FAST2Q: a general-purpose sequence search and counting program for FASTQ files
title_sort	2fast2q: a general-purpose sequence search and counting program for fastq files
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9615965/ https://www.ncbi.nlm.nih.gov/pubmed/36312750 http://dx.doi.org/10.7717/peerj.14041
work_keys_str_mv	AT bravoafonsom 2fast2qageneralpurposesequencesearchandcountingprogramforfastqfiles AT typasathanasios 2fast2qageneralpurposesequencesearchandcountingprogramforfastqfiles AT veeningjanwillem 2fast2qageneralpurposesequencesearchandcountingprogramforfastqfiles

2FAST2Q: a general-purpose sequence search and counting program for FASTQ files

Ejemplares similares