Cargando…

Developing a healthcare dataset information resource (DIR) based on Semantic Web

BACKGROUND: The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Jingyi, Zheng, Mingna, Yao, Lixia, Ge, Yaorong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245488/
https://www.ncbi.nlm.nih.gov/pubmed/30453940
http://dx.doi.org/10.1186/s12920-018-0411-5
_version_ 1783372252300967936
author Shi, Jingyi
Zheng, Mingna
Yao, Lixia
Ge, Yaorong
author_facet Shi, Jingyi
Zheng, Mingna
Yao, Lixia
Ge, Yaorong
author_sort Shi, Jingyi
collection PubMed
description BACKGROUND: The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. METHODS: The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search. RESULTS: The DIR prototype is composed of four major components—dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly. CONCLUSIONS: We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.
format Online
Article
Text
id pubmed-6245488
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62454882018-11-26 Developing a healthcare dataset information resource (DIR) based on Semantic Web Shi, Jingyi Zheng, Mingna Yao, Lixia Ge, Yaorong BMC Med Genomics Research BACKGROUND: The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. METHODS: The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search. RESULTS: The DIR prototype is composed of four major components—dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly. CONCLUSIONS: We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway. BioMed Central 2018-11-20 /pmc/articles/PMC6245488/ /pubmed/30453940 http://dx.doi.org/10.1186/s12920-018-0411-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Shi, Jingyi
Zheng, Mingna
Yao, Lixia
Ge, Yaorong
Developing a healthcare dataset information resource (DIR) based on Semantic Web
title Developing a healthcare dataset information resource (DIR) based on Semantic Web
title_full Developing a healthcare dataset information resource (DIR) based on Semantic Web
title_fullStr Developing a healthcare dataset information resource (DIR) based on Semantic Web
title_full_unstemmed Developing a healthcare dataset information resource (DIR) based on Semantic Web
title_short Developing a healthcare dataset information resource (DIR) based on Semantic Web
title_sort developing a healthcare dataset information resource (dir) based on semantic web
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245488/
https://www.ncbi.nlm.nih.gov/pubmed/30453940
http://dx.doi.org/10.1186/s12920-018-0411-5
work_keys_str_mv AT shijingyi developingahealthcaredatasetinformationresourcedirbasedonsemanticweb
AT zhengmingna developingahealthcaredatasetinformationresourcedirbasedonsemanticweb
AT yaolixia developingahealthcaredatasetinformationresourcedirbasedonsemanticweb
AT geyaorong developingahealthcaredatasetinformationresourcedirbasedonsemanticweb