Cargando…

Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software

BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number...

Descripción completa

Detalles Bibliográficos
Autores principales: Gardner, Paul P., Paterson, James M., McGimpsey, Stephanie, Ashari-Ghomi, Fatemeh, Umu, Sinan U., Pawlik, Aleksandra, Gavryushkin, Alex, Black, Michael A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851831/
https://www.ncbi.nlm.nih.gov/pubmed/35172880
http://dx.doi.org/10.1186/s13059-022-02625-x
_version_ 1784652906616586240
author Gardner, Paul P.
Paterson, James M.
McGimpsey, Stephanie
Ashari-Ghomi, Fatemeh
Umu, Sinan U.
Pawlik, Aleksandra
Gavryushkin, Alex
Black, Michael A.
author_facet Gardner, Paul P.
Paterson, James M.
McGimpsey, Stephanie
Ashari-Ghomi, Fatemeh
Umu, Sinan U.
Pawlik, Aleksandra
Gavryushkin, Alex
Black, Michael A.
author_sort Gardner, Paul P.
collection PubMed
description BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. RESULTS: We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. CONCLUSIONS: Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-022-02625-x).
format Online
Article
Text
id pubmed-8851831
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88518312022-02-22 Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software Gardner, Paul P. Paterson, James M. McGimpsey, Stephanie Ashari-Ghomi, Fatemeh Umu, Sinan U. Pawlik, Aleksandra Gavryushkin, Alex Black, Michael A. Genome Biol Research BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. RESULTS: We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. CONCLUSIONS: Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-022-02625-x). BioMed Central 2022-02-16 /pmc/articles/PMC8851831/ /pubmed/35172880 http://dx.doi.org/10.1186/s13059-022-02625-x Text en © The Author(s) 2022, , corrected publication 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Gardner, Paul P.
Paterson, James M.
McGimpsey, Stephanie
Ashari-Ghomi, Fatemeh
Umu, Sinan U.
Pawlik, Aleksandra
Gavryushkin, Alex
Black, Michael A.
Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title_full Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title_fullStr Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title_full_unstemmed Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title_short Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
title_sort sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851831/
https://www.ncbi.nlm.nih.gov/pubmed/35172880
http://dx.doi.org/10.1186/s13059-022-02625-x
work_keys_str_mv AT gardnerpaulp sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT patersonjamesm sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT mcgimpseystephanie sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT asharighomifatemeh sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT umusinanu sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT pawlikaleksandra sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT gavryushkinalex sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware
AT blackmichaela sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware