Cargando…

Search Datasets in Literature: A Case Study of GWAS

One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Usin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dong, Xiao, Zhang, Yaoyun, Xu, Hua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Informatics Association 2017
Materias:	Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543360/ https://www.ncbi.nlm.nih.gov/pubmed/28815103

_version_	1783255135359598592
author	Dong, Xiao Zhang, Yaoyun Xu, Hua
author_facet	Dong, Xiao Zhang, Yaoyun Xu, Hua
author_sort	Dong, Xiao
collection	PubMed
description	One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Using Genome-Wide Association Studies (GWAS) as a use case, we conducted an initial study to identify GWAS dataset attributes in MEDLINE abstracts, by developing a hybrid approach that combines domain dictionaries and pattern-based rules. The automatic GWAS dataset attribute recognition system achieved an F-measure of 84.85%. We further applied the GWAS attribute recognition system to indexing MEDLINE abstracts and built an online GWAS dataset search engine called “GWAS Dataset Finder”. Our evaluation showed that the GWAS Dataset Finder outperformed PubMed significantly in retrieving literature with desired datasets. Our study demonstrates the potential application of text mining methods in building the data discovery index. It can create a better index of literature linked with their underlying data sets, thus improving data discoverability.
format	Online Article Text
id	pubmed-5543360
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	American Medical Informatics Association
record_format	MEDLINE/PubMed
spelling	pubmed-55433602017-08-16 Search Datasets in Literature: A Case Study of GWAS Dong, Xiao Zhang, Yaoyun Xu, Hua AMIA Jt Summits Transl Sci Proc Articles One of the missions of the NIH BD2K (Big Data to Knowledge) initiative is to make data discoverable and promote the re-use of existing datasets. Our ultimate goal is to develop a scalable approach that can automatically scan millions of scientific publications and identify underlying data sets. Using Genome-Wide Association Studies (GWAS) as a use case, we conducted an initial study to identify GWAS dataset attributes in MEDLINE abstracts, by developing a hybrid approach that combines domain dictionaries and pattern-based rules. The automatic GWAS dataset attribute recognition system achieved an F-measure of 84.85%. We further applied the GWAS attribute recognition system to indexing MEDLINE abstracts and built an online GWAS dataset search engine called “GWAS Dataset Finder”. Our evaluation showed that the GWAS Dataset Finder outperformed PubMed significantly in retrieving literature with desired datasets. Our study demonstrates the potential application of text mining methods in building the data discovery index. It can create a better index of literature linked with their underlying data sets, thus improving data discoverability. American Medical Informatics Association 2017-07-26 /pmc/articles/PMC5543360/ /pubmed/28815103 Text en ©2017 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle	Articles Dong, Xiao Zhang, Yaoyun Xu, Hua Search Datasets in Literature: A Case Study of GWAS
title	Search Datasets in Literature: A Case Study of GWAS
title_full	Search Datasets in Literature: A Case Study of GWAS
title_fullStr	Search Datasets in Literature: A Case Study of GWAS
title_full_unstemmed	Search Datasets in Literature: A Case Study of GWAS
title_short	Search Datasets in Literature: A Case Study of GWAS
title_sort	search datasets in literature: a case study of gwas
topic	Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543360/ https://www.ncbi.nlm.nih.gov/pubmed/28815103
work_keys_str_mv	AT dongxiao searchdatasetsinliteratureacasestudyofgwas AT zhangyaoyun searchdatasetsinliteratureacasestudyofgwas AT xuhua searchdatasetsinliteratureacasestudyofgwas

Search Datasets in Literature: A Case Study of GWAS

Ejemplares similares