Cargando…

Assessing and assuring interoperability of a genomics file format

MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes...

Descripción completa

Detalles Bibliográficos
Autores principales: Niu, Yi Nian, Roberts, Eric G, Denisko, Danielle, Hoffman, Michael M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237710/
https://www.ncbi.nlm.nih.gov/pubmed/35575355
http://dx.doi.org/10.1093/bioinformatics/btac327
_version_ 1784736859143798784
author Niu, Yi Nian
Roberts, Eric G
Denisko, Danielle
Hoffman, Michael M
author_facet Niu, Yi Nian
Roberts, Eric G
Denisko, Danielle
Hoffman, Michael M
author_sort Niu, Yi Nian
collection PubMed
description MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9237710
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92377102022-06-29 Assessing and assuring interoperability of a genomics file format Niu, Yi Nian Roberts, Eric G Denisko, Danielle Hoffman, Michael M Bioinformatics Original Papers MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-05-16 /pmc/articles/PMC9237710/ /pubmed/35575355 http://dx.doi.org/10.1093/bioinformatics/btac327 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Niu, Yi Nian
Roberts, Eric G
Denisko, Danielle
Hoffman, Michael M
Assessing and assuring interoperability of a genomics file format
title Assessing and assuring interoperability of a genomics file format
title_full Assessing and assuring interoperability of a genomics file format
title_fullStr Assessing and assuring interoperability of a genomics file format
title_full_unstemmed Assessing and assuring interoperability of a genomics file format
title_short Assessing and assuring interoperability of a genomics file format
title_sort assessing and assuring interoperability of a genomics file format
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237710/
https://www.ncbi.nlm.nih.gov/pubmed/35575355
http://dx.doi.org/10.1093/bioinformatics/btac327
work_keys_str_mv AT niuyinian assessingandassuringinteroperabilityofagenomicsfileformat
AT robertsericg assessingandassuringinteroperabilityofagenomicsfileformat
AT deniskodanielle assessingandassuringinteroperabilityofagenomicsfileformat
AT hoffmanmichaelm assessingandassuringinteroperabilityofagenomicsfileformat