Cargando…

Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databas...

Descripción completa

Detalles Bibliográficos
Autores principales: Rosado, Eduardo, Garcia-Remesal, Miguel, Paraiso-Medina, Sergio, Pazos, Alejandro, Maojo, Victor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7952234/
https://www.ncbi.nlm.nih.gov/pubmed/33629960
http://dx.doi.org/10.2196/22976
_version_ 1783663683039133696
author Rosado, Eduardo
Garcia-Remesal, Miguel
Paraiso-Medina, Sergio
Pazos, Alejandro
Maojo, Victor
author_facet Rosado, Eduardo
Garcia-Remesal, Miguel
Paraiso-Medina, Sergio
Pazos, Alejandro
Maojo, Victor
author_sort Rosado, Eduardo
collection PubMed
description BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. METHODS: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. RESULTS: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. CONCLUSIONS: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).
format Online
Article
Text
id pubmed-7952234
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-79522342021-03-17 Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory Rosado, Eduardo Garcia-Remesal, Miguel Paraiso-Medina, Sergio Pazos, Alejandro Maojo, Victor JMIR Med Inform Original Paper BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. METHODS: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. RESULTS: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. CONCLUSIONS: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others). JMIR Publications 2021-02-25 /pmc/articles/PMC7952234/ /pubmed/33629960 http://dx.doi.org/10.2196/22976 Text en ©Eduardo Rosado, Miguel Garcia-Remesal, Sergio Paraiso-Medina, Alejandro Pazos, Victor Maojo. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 25.02.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Rosado, Eduardo
Garcia-Remesal, Miguel
Paraiso-Medina, Sergio
Pazos, Alejandro
Maojo, Victor
Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title_full Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title_fullStr Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title_full_unstemmed Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title_short Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
title_sort using machine learning to collect and facilitate remote access to biomedical databases: development of the biomedical database inventory
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7952234/
https://www.ncbi.nlm.nih.gov/pubmed/33629960
http://dx.doi.org/10.2196/22976
work_keys_str_mv AT rosadoeduardo usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory
AT garciaremesalmiguel usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory
AT paraisomedinasergio usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory
AT pazosalejandro usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory
AT maojovictor usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory