Cargando…
Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory
BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databas...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7952234/ https://www.ncbi.nlm.nih.gov/pubmed/33629960 http://dx.doi.org/10.2196/22976 |
_version_ | 1783663683039133696 |
---|---|
author | Rosado, Eduardo Garcia-Remesal, Miguel Paraiso-Medina, Sergio Pazos, Alejandro Maojo, Victor |
author_facet | Rosado, Eduardo Garcia-Remesal, Miguel Paraiso-Medina, Sergio Pazos, Alejandro Maojo, Victor |
author_sort | Rosado, Eduardo |
collection | PubMed |
description | BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. METHODS: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. RESULTS: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. CONCLUSIONS: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others). |
format | Online Article Text |
id | pubmed-7952234 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-79522342021-03-17 Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory Rosado, Eduardo Garcia-Remesal, Miguel Paraiso-Medina, Sergio Pazos, Alejandro Maojo, Victor JMIR Med Inform Original Paper BACKGROUND: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. OBJECTIVE: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. METHODS: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. RESULTS: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. CONCLUSIONS: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others). JMIR Publications 2021-02-25 /pmc/articles/PMC7952234/ /pubmed/33629960 http://dx.doi.org/10.2196/22976 Text en ©Eduardo Rosado, Miguel Garcia-Remesal, Sergio Paraiso-Medina, Alejandro Pazos, Victor Maojo. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 25.02.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Rosado, Eduardo Garcia-Remesal, Miguel Paraiso-Medina, Sergio Pazos, Alejandro Maojo, Victor Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title | Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title_full | Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title_fullStr | Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title_full_unstemmed | Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title_short | Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory |
title_sort | using machine learning to collect and facilitate remote access to biomedical databases: development of the biomedical database inventory |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7952234/ https://www.ncbi.nlm.nih.gov/pubmed/33629960 http://dx.doi.org/10.2196/22976 |
work_keys_str_mv | AT rosadoeduardo usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory AT garciaremesalmiguel usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory AT paraisomedinasergio usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory AT pazosalejandro usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory AT maojovictor usingmachinelearningtocollectandfacilitateremoteaccesstobiomedicaldatabasesdevelopmentofthebiomedicaldatabaseinventory |