Cargando…
TextFormats: Simplifying the definition and parsing of text formats in bioinformatics
Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new k...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135226/ https://www.ncbi.nlm.nih.gov/pubmed/35617194 http://dx.doi.org/10.1371/journal.pone.0268910 |
_version_ | 1784713915595227136 |
---|---|
author | Gonnella, Giorgio |
author_facet | Gonnella, Giorgio |
author_sort | Gonnella, Giorgio |
collection | PubMed |
description | Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new kinds of data often require the definition of new formats. Often new formats are not formally described in a standard or specification document. Although software libraries are available for accessing the most common formats, writing parsers for text formats, for which no library is currently available, is a very common though tedious task, utilized by many researchers in the field. This manuscript presents the open source software library and toolset TextFormats (available at https://github.com/ggonnella/textformats), which aims at simplifying the definition and parsing of text formats. Formats specifications are written in a simple data description format using an interactive wizard. Automatic generation of data examples and automatic testing of specifications allow for checking for correctness. Given the specification for a text format, TextFormats allows parsing and writing data in that format, using several programming languages (Nim, Python, C/C++) or the provided command line and graphical user interface tools. Although designed as a general purpose software, the main target application field, for the above mentioned reasons, is expected to be in bioinformatics: Thus, the specifications of several common existing bioinformatics formats are included. |
format | Online Article Text |
id | pubmed-9135226 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-91352262022-05-27 TextFormats: Simplifying the definition and parsing of text formats in bioinformatics Gonnella, Giorgio PLoS One Research Article Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new kinds of data often require the definition of new formats. Often new formats are not formally described in a standard or specification document. Although software libraries are available for accessing the most common formats, writing parsers for text formats, for which no library is currently available, is a very common though tedious task, utilized by many researchers in the field. This manuscript presents the open source software library and toolset TextFormats (available at https://github.com/ggonnella/textformats), which aims at simplifying the definition and parsing of text formats. Formats specifications are written in a simple data description format using an interactive wizard. Automatic generation of data examples and automatic testing of specifications allow for checking for correctness. Given the specification for a text format, TextFormats allows parsing and writing data in that format, using several programming languages (Nim, Python, C/C++) or the provided command line and graphical user interface tools. Although designed as a general purpose software, the main target application field, for the above mentioned reasons, is expected to be in bioinformatics: Thus, the specifications of several common existing bioinformatics formats are included. Public Library of Science 2022-05-26 /pmc/articles/PMC9135226/ /pubmed/35617194 http://dx.doi.org/10.1371/journal.pone.0268910 Text en © 2022 Giorgio Gonnella https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Gonnella, Giorgio TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title | TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title_full | TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title_fullStr | TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title_full_unstemmed | TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title_short | TextFormats: Simplifying the definition and parsing of text formats in bioinformatics |
title_sort | textformats: simplifying the definition and parsing of text formats in bioinformatics |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135226/ https://www.ncbi.nlm.nih.gov/pubmed/35617194 http://dx.doi.org/10.1371/journal.pone.0268910 |
work_keys_str_mv | AT gonnellagiorgio textformatssimplifyingthedefinitionandparsingoftextformatsinbioinformatics |