Cargando…
Assessing and assuring interoperability of a genomics file format
MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237710/ https://www.ncbi.nlm.nih.gov/pubmed/35575355 http://dx.doi.org/10.1093/bioinformatics/btac327 |
_version_ | 1784736859143798784 |
---|---|
author | Niu, Yi Nian Roberts, Eric G Denisko, Danielle Hoffman, Michael M |
author_facet | Niu, Yi Nian Roberts, Eric G Denisko, Danielle Hoffman, Michael M |
author_sort | Niu, Yi Nian |
collection | PubMed |
description | MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9237710 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92377102022-06-29 Assessing and assuring interoperability of a genomics file format Niu, Yi Nian Roberts, Eric G Denisko, Danielle Hoffman, Michael M Bioinformatics Original Papers MOTIVATION: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. RESULTS: We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. AVAILABILITY AND IMPLEMENTATION: Acidbio is available at https://github.com/hoffmangroup/acidbio. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-05-16 /pmc/articles/PMC9237710/ /pubmed/35575355 http://dx.doi.org/10.1093/bioinformatics/btac327 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Niu, Yi Nian Roberts, Eric G Denisko, Danielle Hoffman, Michael M Assessing and assuring interoperability of a genomics file format |
title | Assessing and assuring interoperability of a genomics file format |
title_full | Assessing and assuring interoperability of a genomics file format |
title_fullStr | Assessing and assuring interoperability of a genomics file format |
title_full_unstemmed | Assessing and assuring interoperability of a genomics file format |
title_short | Assessing and assuring interoperability of a genomics file format |
title_sort | assessing and assuring interoperability of a genomics file format |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9237710/ https://www.ncbi.nlm.nih.gov/pubmed/35575355 http://dx.doi.org/10.1093/bioinformatics/btac327 |
work_keys_str_mv | AT niuyinian assessingandassuringinteroperabilityofagenomicsfileformat AT robertsericg assessingandassuringinteroperabilityofagenomicsfileformat AT deniskodanielle assessingandassuringinteroperabilityofagenomicsfileformat AT hoffmanmichaelm assessingandassuringinteroperabilityofagenomicsfileformat |