Cargando…
Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851831/ https://www.ncbi.nlm.nih.gov/pubmed/35172880 http://dx.doi.org/10.1186/s13059-022-02625-x |
_version_ | 1784652906616586240 |
---|---|
author | Gardner, Paul P. Paterson, James M. McGimpsey, Stephanie Ashari-Ghomi, Fatemeh Umu, Sinan U. Pawlik, Aleksandra Gavryushkin, Alex Black, Michael A. |
author_facet | Gardner, Paul P. Paterson, James M. McGimpsey, Stephanie Ashari-Ghomi, Fatemeh Umu, Sinan U. Pawlik, Aleksandra Gavryushkin, Alex Black, Michael A. |
author_sort | Gardner, Paul P. |
collection | PubMed |
description | BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. RESULTS: We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. CONCLUSIONS: Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-022-02625-x). |
format | Online Article Text |
id | pubmed-8851831 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-88518312022-02-22 Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software Gardner, Paul P. Paterson, James M. McGimpsey, Stephanie Ashari-Ghomi, Fatemeh Umu, Sinan U. Pawlik, Aleksandra Gavryushkin, Alex Black, Michael A. Genome Biol Research BACKGROUND: Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. RESULTS: We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. CONCLUSIONS: Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s13059-022-02625-x). BioMed Central 2022-02-16 /pmc/articles/PMC8851831/ /pubmed/35172880 http://dx.doi.org/10.1186/s13059-022-02625-x Text en © The Author(s) 2022, , corrected publication 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Gardner, Paul P. Paterson, James M. McGimpsey, Stephanie Ashari-Ghomi, Fatemeh Umu, Sinan U. Pawlik, Aleksandra Gavryushkin, Alex Black, Michael A. Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title | Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title_full | Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title_fullStr | Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title_full_unstemmed | Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title_short | Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
title_sort | sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8851831/ https://www.ncbi.nlm.nih.gov/pubmed/35172880 http://dx.doi.org/10.1186/s13059-022-02625-x |
work_keys_str_mv | AT gardnerpaulp sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT patersonjamesm sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT mcgimpseystephanie sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT asharighomifatemeh sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT umusinanu sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT pawlikaleksandra sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT gavryushkinalex sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware AT blackmichaela sustainedsoftwaredevelopmentnotnumberofcitationsorjournalchoiceisindicativeofaccuratebioinformaticsoftware |