Cargando…

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts

The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop porta...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yanshan, Rastegar-Mojarad, Majid, Komandur-Elayavilli, Ravikumar, Liu, Hongfang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7243926/ https://www.ncbi.nlm.nih.gov/pubmed/31725862 http://dx.doi.org/10.1093/database/bax091

_version_	1783537489120591872
author	Wang, Yanshan Rastegar-Mojarad, Majid Komandur-Elayavilli, Ravikumar Liu, Hongfang
author_facet	Wang, Yanshan Rastegar-Mojarad, Majid Komandur-Elayavilli, Ravikumar Liu, Hongfang
author_sort	Wang, Yanshan
collection	PubMed
description	The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers’ queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata
format	Online Article Text
id	pubmed-7243926
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-72439262020-05-27 Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts Wang, Yanshan Rastegar-Mojarad, Majid Komandur-Elayavilli, Ravikumar Liu, Hongfang Database (Oxford) Original Article The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers’ queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata Oxford University Press 2017-12-20 /pmc/articles/PMC7243926/ /pubmed/31725862 http://dx.doi.org/10.1093/database/bax091 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Wang, Yanshan Rastegar-Mojarad, Majid Komandur-Elayavilli, Ravikumar Liu, Hongfang Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title	Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title_full	Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title_fullStr	Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title_full_unstemmed	Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title_short	Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
title_sort	leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7243926/ https://www.ncbi.nlm.nih.gov/pubmed/31725862 http://dx.doi.org/10.1093/database/bax091
work_keys_str_mv	AT wangyanshan leveragingwordembeddingsandmedicalentityextractionforbiomedicaldatasetretrievalusingunstructuredtexts AT rastegarmojaradmajid leveragingwordembeddingsandmedicalentityextractionforbiomedicaldatasetretrievalusingunstructuredtexts AT komandurelayavilliravikumar leveragingwordembeddingsandmedicalentityextractionforbiomedicaldatasetretrievalusingunstructuredtexts AT liuhongfang leveragingwordembeddingsandmedicalentityextractionforbiomedicaldatasetretrievalusingunstructuredtexts

Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts

Ejemplares similares