Cargando…

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particul...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Wei, Le, Shuai, Li, Yan, Hu, Fuquan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051824/
https://www.ncbi.nlm.nih.gov/pubmed/27706213
http://dx.doi.org/10.1371/journal.pone.0163962
_version_ 1782458149168480256
author Shen, Wei
Le, Shuai
Li, Yan
Hu, Fuquan
author_facet Shen, Wei
Le, Shuai
Li, Yan
Hu, Fuquan
author_sort Shen, Wei
collection PubMed
description FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
format Online
Article
Text
id pubmed-5051824
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50518242016-10-27 SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation Shen, Wei Le, Shuai Li, Yan Hu, Fuquan PLoS One Research Article FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit. Public Library of Science 2016-10-05 /pmc/articles/PMC5051824/ /pubmed/27706213 http://dx.doi.org/10.1371/journal.pone.0163962 Text en © 2016 Shen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shen, Wei
Le, Shuai
Li, Yan
Hu, Fuquan
SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title_full SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title_fullStr SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title_full_unstemmed SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title_short SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
title_sort seqkit: a cross-platform and ultrafast toolkit for fasta/q file manipulation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5051824/
https://www.ncbi.nlm.nih.gov/pubmed/27706213
http://dx.doi.org/10.1371/journal.pone.0163962
work_keys_str_mv AT shenwei seqkitacrossplatformandultrafasttoolkitforfastaqfilemanipulation
AT leshuai seqkitacrossplatformandultrafasttoolkitforfastaqfilemanipulation
AT liyan seqkitacrossplatformandultrafasttoolkitforfastaqfilemanipulation
AT hufuquan seqkitacrossplatformandultrafasttoolkitforfastaqfilemanipulation