Cargando…

TextFormats: Simplifying the definition and parsing of text formats in bioinformatics

Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new k...

Descripción completa

Detalles Bibliográficos
Autor principal: Gonnella, Giorgio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135226/
https://www.ncbi.nlm.nih.gov/pubmed/35617194
http://dx.doi.org/10.1371/journal.pone.0268910
_version_ 1784713915595227136
author Gonnella, Giorgio
author_facet Gonnella, Giorgio
author_sort Gonnella, Giorgio
collection PubMed
description Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new kinds of data often require the definition of new formats. Often new formats are not formally described in a standard or specification document. Although software libraries are available for accessing the most common formats, writing parsers for text formats, for which no library is currently available, is a very common though tedious task, utilized by many researchers in the field. This manuscript presents the open source software library and toolset TextFormats (available at https://github.com/ggonnella/textformats), which aims at simplifying the definition and parsing of text formats. Formats specifications are written in a simple data description format using an interactive wizard. Automatic generation of data examples and automatic testing of specifications allow for checking for correctness. Given the specification for a text format, TextFormats allows parsing and writing data in that format, using several programming languages (Nim, Python, C/C++) or the provided command line and graphical user interface tools. Although designed as a general purpose software, the main target application field, for the above mentioned reasons, is expected to be in bioinformatics: Thus, the specifications of several common existing bioinformatics formats are included.
format Online
Article
Text
id pubmed-9135226
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-91352262022-05-27 TextFormats: Simplifying the definition and parsing of text formats in bioinformatics Gonnella, Giorgio PLoS One Research Article Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new kinds of data often require the definition of new formats. Often new formats are not formally described in a standard or specification document. Although software libraries are available for accessing the most common formats, writing parsers for text formats, for which no library is currently available, is a very common though tedious task, utilized by many researchers in the field. This manuscript presents the open source software library and toolset TextFormats (available at https://github.com/ggonnella/textformats), which aims at simplifying the definition and parsing of text formats. Formats specifications are written in a simple data description format using an interactive wizard. Automatic generation of data examples and automatic testing of specifications allow for checking for correctness. Given the specification for a text format, TextFormats allows parsing and writing data in that format, using several programming languages (Nim, Python, C/C++) or the provided command line and graphical user interface tools. Although designed as a general purpose software, the main target application field, for the above mentioned reasons, is expected to be in bioinformatics: Thus, the specifications of several common existing bioinformatics formats are included. Public Library of Science 2022-05-26 /pmc/articles/PMC9135226/ /pubmed/35617194 http://dx.doi.org/10.1371/journal.pone.0268910 Text en © 2022 Giorgio Gonnella https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gonnella, Giorgio
TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title_full TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title_fullStr TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title_full_unstemmed TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title_short TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
title_sort textformats: simplifying the definition and parsing of text formats in bioinformatics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135226/
https://www.ncbi.nlm.nih.gov/pubmed/35617194
http://dx.doi.org/10.1371/journal.pone.0268910
work_keys_str_mv AT gonnellagiorgio textformatssimplifyingthedefinitionandparsingoftextformatsinbioinformatics