Cargando…

An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival

BACKGROUND: Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from differen...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Hansi, Guo, Yi, Li, Qian, George, Thomas J., Shenkman, Elizabeth, Modave, François, Bian, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069766/
https://www.ncbi.nlm.nih.gov/pubmed/30066664
http://dx.doi.org/10.1186/s12911-018-0636-4
_version_ 1783343564040699904
author Zhang, Hansi
Guo, Yi
Li, Qian
George, Thomas J.
Shenkman, Elizabeth
Modave, François
Bian, Jiang
author_facet Zhang, Hansi
Guo, Yi
Li, Qian
George, Thomas J.
Shenkman, Elizabeth
Modave, François
Bian, Jiang
author_sort Zhang, Hansi
collection PubMed
description BACKGROUND: Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges. METHODS: Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs. RESULTS: Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies. CONCLUSIONS: Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA.
format Online
Article
Text
id pubmed-6069766
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60697662018-08-03 An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival Zhang, Hansi Guo, Yi Li, Qian George, Thomas J. Shenkman, Elizabeth Modave, François Bian, Jiang BMC Med Inform Decis Mak Research BACKGROUND: Cancer is the second leading cause of death in the United States, exceeded only by heart disease. Extant cancer survival analyses have primarily focused on individual-level factors due to limited data availability from a single data source. There is a need to integrate data from different sources to simultaneously study as much risk factors as possible. Thus, we proposed an ontology-based approach to integrate heterogeneous datasets addressing key data integration challenges. METHODS: Following best practices in ontology engineering, we created the Ontology for Cancer Research Variables (OCRV) adapting existing semantic resources such as the National Cancer Institute (NCI) Thesaurus. Using the global-as-view data integration approach, we created mapping axioms to link the data elements in different sources to OCRV. Implemented upon the Ontop platform, we built a data integration pipeline to query, extract, and transform data in relational databases using semantic queries into a pooled dataset according to the downstream multi-level Integrative Data Analysis (IDA) needs. RESULTS: Based on our use cases in the cancer survival IDA, we created tailored ontological structures in OCRV to facilitate the data integration tasks. Specifically, we created a flexible framework addressing key integration challenges: (1) using a shared, controlled vocabulary to make data understandable to both human and computers, (2) explicitly modeling the semantic relationships makes it possible to compute and reason with the data, (3) linking patients to contextual and environmental factors through geographic variables, (4) being able to document the data manipulation and integration processes clearly in the ontologies. CONCLUSIONS: Using an ontology-based data integration approach not only standardizes the definitions of data variables through a common, controlled vocabulary, but also makes the semantic relationships among variables from different sources explicit and clear to all users of the same datasets. Such an approach resolves the ambiguity in variable selection, extraction and integration processes and thus improve reproducibility of the IDA. BioMed Central 2018-07-23 /pmc/articles/PMC6069766/ /pubmed/30066664 http://dx.doi.org/10.1186/s12911-018-0636-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Hansi
Guo, Yi
Li, Qian
George, Thomas J.
Shenkman, Elizabeth
Modave, François
Bian, Jiang
An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title_full An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title_fullStr An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title_full_unstemmed An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title_short An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
title_sort ontology-guided semantic data integration framework to support integrative data analysis of cancer survival
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069766/
https://www.ncbi.nlm.nih.gov/pubmed/30066664
http://dx.doi.org/10.1186/s12911-018-0636-4
work_keys_str_mv AT zhanghansi anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT guoyi anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT liqian anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT georgethomasj anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT shenkmanelizabeth anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT modavefrancois anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT bianjiang anontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT zhanghansi ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT guoyi ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT liqian ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT georgethomasj ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT shenkmanelizabeth ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT modavefrancois ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival
AT bianjiang ontologyguidedsemanticdataintegrationframeworktosupportintegrativedataanalysisofcancersurvival