Cargando…
FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale
MOTIVATION: FASTA files are the de facto standard for sharing, manipulating and storing biological sequences, while concatenated in multiFASTA they tend to be unwieldy for two main reasons: (i) they can become big enough that their manipulation with standard text-editing tools is unpractical, either...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875552/ https://www.ncbi.nlm.nih.gov/pubmed/36713287 http://dx.doi.org/10.1093/bioadv/vbac091 |
_version_ | 1784877983680430080 |
---|---|
author | Delehelle, Franklin Roest Crollius, Hugues |
author_facet | Delehelle, Franklin Roest Crollius, Hugues |
author_sort | Delehelle, Franklin |
collection | PubMed |
description | MOTIVATION: FASTA files are the de facto standard for sharing, manipulating and storing biological sequences, while concatenated in multiFASTA they tend to be unwieldy for two main reasons: (i) they can become big enough that their manipulation with standard text-editing tools is unpractical, either due to slowness or memory consumption; (ii) by mixing metadata (headers) and data (sequences), bulk operations using standard text streaming tools (such as sed or awk) are impossible without including a parsing step, which may be error-prone and introduce friction in the development process. RESULTS: Here, we present FUSTA (FUse for faSTA), a software utility which makes use of the FUSE technology to expose a multiFASTA file as a hierarchy of virtual files, letting users operate directly on the sequences as independent virtual files through classical file manipulation methods. AVAILABILITY AND IMPLEMENTATION: FUSTA is freely available under the CeCILL-C (LGPLv3-compatible) license at https://github.com/delehef/fusta. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
format | Online Article Text |
id | pubmed-9875552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98755522023-01-26 FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale Delehelle, Franklin Roest Crollius, Hugues Bioinform Adv Application Note MOTIVATION: FASTA files are the de facto standard for sharing, manipulating and storing biological sequences, while concatenated in multiFASTA they tend to be unwieldy for two main reasons: (i) they can become big enough that their manipulation with standard text-editing tools is unpractical, either due to slowness or memory consumption; (ii) by mixing metadata (headers) and data (sequences), bulk operations using standard text streaming tools (such as sed or awk) are impossible without including a parsing step, which may be error-prone and introduce friction in the development process. RESULTS: Here, we present FUSTA (FUse for faSTA), a software utility which makes use of the FUSE technology to expose a multiFASTA file as a hierarchy of virtual files, letting users operate directly on the sequences as independent virtual files through classical file manipulation methods. AVAILABILITY AND IMPLEMENTATION: FUSTA is freely available under the CeCILL-C (LGPLv3-compatible) license at https://github.com/delehef/fusta. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-11-29 /pmc/articles/PMC9875552/ /pubmed/36713287 http://dx.doi.org/10.1093/bioadv/vbac091 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Application Note Delehelle, Franklin Roest Crollius, Hugues FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title | FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title_full | FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title_fullStr | FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title_full_unstemmed | FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title_short | FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale |
title_sort | fusta: leveraging fuse for manipulation of multifasta files at scale |
topic | Application Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9875552/ https://www.ncbi.nlm.nih.gov/pubmed/36713287 http://dx.doi.org/10.1093/bioadv/vbac091 |
work_keys_str_mv | AT delehellefranklin fustaleveragingfuseformanipulationofmultifastafilesatscale AT roestcrolliushugues fustaleveragingfuseformanipulationofmultifastafilesatscale |