Cargando…

A Survey of Bioinformatics Database and Software Usage through Mining the Literature

Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availabilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Duck, Geraint, Nenadic, Goran, Filannino, Michele, Brass, Andy, Robertson, David L., Stevens, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917176/
https://www.ncbi.nlm.nih.gov/pubmed/27331905
http://dx.doi.org/10.1371/journal.pone.0157989
_version_ 1782438915456630784
author Duck, Geraint
Nenadic, Goran
Filannino, Michele
Brass, Andy
Robertson, David L.
Stevens, Robert
author_facet Duck, Geraint
Nenadic, Goran
Filannino, Michele
Brass, Andy
Robertson, David L.
Stevens, Robert
author_sort Duck, Geraint
collection PubMed
description Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
format Online
Article
Text
id pubmed-4917176
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49171762016-07-08 A Survey of Bioinformatics Database and Software Usage through Mining the Literature Duck, Geraint Nenadic, Goran Filannino, Michele Brass, Andy Robertson, David L. Stevens, Robert PLoS One Research Article Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371. Public Library of Science 2016-06-22 /pmc/articles/PMC4917176/ /pubmed/27331905 http://dx.doi.org/10.1371/journal.pone.0157989 Text en © 2016 Duck et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Duck, Geraint
Nenadic, Goran
Filannino, Michele
Brass, Andy
Robertson, David L.
Stevens, Robert
A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title_full A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title_fullStr A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title_full_unstemmed A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title_short A Survey of Bioinformatics Database and Software Usage through Mining the Literature
title_sort survey of bioinformatics database and software usage through mining the literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917176/
https://www.ncbi.nlm.nih.gov/pubmed/27331905
http://dx.doi.org/10.1371/journal.pone.0157989
work_keys_str_mv AT duckgeraint asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT nenadicgoran asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT filanninomichele asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT brassandy asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT robertsondavidl asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT stevensrobert asurveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT duckgeraint surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT nenadicgoran surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT filanninomichele surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT brassandy surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT robertsondavidl surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature
AT stevensrobert surveyofbioinformaticsdatabaseandsoftwareusagethroughminingtheliterature