Cargando…

bioNerDS: exploring bioinformatics’ database and software use through literature mining

BACKGROUND: Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While sch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Duck, Geraint, Nenadic, Goran, Brass, Andy, Robertson, David L, Stevens, Robert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693927/ https://www.ncbi.nlm.nih.gov/pubmed/23768135 http://dx.doi.org/10.1186/1471-2105-14-194

_version_	1782274775442259968
author	Duck, Geraint Nenadic, Goran Brass, Andy Robertson, David L Stevens, Robert
author_facet	Duck, Geraint Nenadic, Goran Brass, Andy Robertson, David L Stevens, Robert
author_sort	Duck, Geraint
collection	PubMed
description	BACKGROUND: Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. RESULTS: We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. ABSTRACT: Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/.
format	Online Article Text
id	pubmed-3693927
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36939272013-06-28 bioNerDS: exploring bioinformatics’ database and software use through literature mining Duck, Geraint Nenadic, Goran Brass, Andy Robertson, David L Stevens, Robert BMC Bioinformatics Research Article BACKGROUND: Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. RESULTS: We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. ABSTRACT: Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. BioMed Central 2013-06-15 /pmc/articles/PMC3693927/ /pubmed/23768135 http://dx.doi.org/10.1186/1471-2105-14-194 Text en Copyright © 2013 Duck et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Duck, Geraint Nenadic, Goran Brass, Andy Robertson, David L Stevens, Robert bioNerDS: exploring bioinformatics’ database and software use through literature mining
title	bioNerDS: exploring bioinformatics’ database and software use through literature mining
title_full	bioNerDS: exploring bioinformatics’ database and software use through literature mining
title_fullStr	bioNerDS: exploring bioinformatics’ database and software use through literature mining
title_full_unstemmed	bioNerDS: exploring bioinformatics’ database and software use through literature mining
title_short	bioNerDS: exploring bioinformatics’ database and software use through literature mining
title_sort	bionerds: exploring bioinformatics’ database and software use through literature mining
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693927/ https://www.ncbi.nlm.nih.gov/pubmed/23768135 http://dx.doi.org/10.1186/1471-2105-14-194
work_keys_str_mv	AT duckgeraint bionerdsexploringbioinformaticsdatabaseandsoftwareusethroughliteraturemining AT nenadicgoran bionerdsexploringbioinformaticsdatabaseandsoftwareusethroughliteraturemining AT brassandy bionerdsexploringbioinformaticsdatabaseandsoftwareusethroughliteraturemining AT robertsondavidl bionerdsexploringbioinformaticsdatabaseandsoftwareusethroughliteraturemining AT stevensrobert bionerdsexploringbioinformaticsdatabaseandsoftwareusethroughliteraturemining

bioNerDS: exploring bioinformatics’ database and software use through literature mining

Ejemplares similares