Cargando…

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis

BACKGROUND: Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. OBJECTIVE: The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles f...

Descripción completa

Detalles Bibliográficos
Autores principales: Segura Bedmar, Isabel, Martínez, Paloma, Carruana Martín, Adrián
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732329/
https://www.ncbi.nlm.nih.gov/pubmed/29196280
http://dx.doi.org/10.2196/medinform.7059
_version_ 1783286671143337984
author Segura Bedmar, Isabel
Martínez, Paloma
Carruana Martín, Adrián
author_facet Segura Bedmar, Isabel
Martínez, Paloma
Carruana Martín, Adrián
author_sort Segura Bedmar, Isabel
collection PubMed
description BACKGROUND: Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. OBJECTIVE: The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. METHODS: Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. RESULTS: Our experiments show promising results with an F1 of 69% on the test dataset. CONCLUSIONS: To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time.
format Online
Article
Text
id pubmed-5732329
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-57323292017-12-22 Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis Segura Bedmar, Isabel Martínez, Paloma Carruana Martín, Adrián JMIR Med Inform Original Paper BACKGROUND: Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. OBJECTIVE: The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. METHODS: Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. RESULTS: Our experiments show promising results with an F1 of 69% on the test dataset. CONCLUSIONS: To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. JMIR Publications 2017-12-01 /pmc/articles/PMC5732329/ /pubmed/29196280 http://dx.doi.org/10.2196/medinform.7059 Text en ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Segura Bedmar, Isabel
Martínez, Paloma
Carruana Martín, Adrián
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title_full Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title_fullStr Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title_full_unstemmed Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title_short Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis
title_sort search and graph database technologies for biomedical semantic indexing: experimental analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5732329/
https://www.ncbi.nlm.nih.gov/pubmed/29196280
http://dx.doi.org/10.2196/medinform.7059
work_keys_str_mv AT segurabedmarisabel searchandgraphdatabasetechnologiesforbiomedicalsemanticindexingexperimentalanalysis
AT martinezpaloma searchandgraphdatabasetechnologiesforbiomedicalsemanticindexingexperimentalanalysis
AT carruanamartinadrian searchandgraphdatabasetechnologiesforbiomedicalsemanticindexingexperimentalanalysis