Cargando…

Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies

BACKGROUND: There is an increasing recognition of the need for the data capture phase of clinical studies to be improved and for more effective sharing of clinical data. The Health Care and Life Sciences community has embraced semantic technologies to facilitate the integration of health data from e...

Descripción completa

Detalles Bibliográficos
Autores principales: Leroux, Hugo, Lefort, Laurent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429421/
https://www.ncbi.nlm.nih.gov/pubmed/25973166
http://dx.doi.org/10.1186/s13326-015-0012-6
_version_ 1782371034576453632
author Leroux, Hugo
Lefort, Laurent
author_facet Leroux, Hugo
Lefort, Laurent
author_sort Leroux, Hugo
collection PubMed
description BACKGROUND: There is an increasing recognition of the need for the data capture phase of clinical studies to be improved and for more effective sharing of clinical data. The Health Care and Life Sciences community has embraced semantic technologies to facilitate the integration of health data from electronic health records, clinical studies and pharmaceutical research. This paper explores the integration of clinical study data exchange standards and semantic statistic vocabularies to deliver clinical data as linked data in a format that is easier to enrich with links to complementary data sources and consume by a broad user base. METHODS: We propose a Linked Clinical Data Cube (LCDC), which combines the strength of the RDF Data Cube and DDI-RDF vocabulary to enrich clinical data based on the CDISC standards. The CDISC standards provide the mechanisms for the data to be standardised, made more accessible and accountable whereas the RDF Data Cube and DDI-RDF vocabularies provide novel approaches to managing large volumes of heterogeneous linked data resources. RESULTS: We validate our approach using a large-scale longitudinal clinical study into neurodegenerative diseases. This dataset, comprising more than 1600 variables clustered in 25 different sub-domains, has been fully converted into RDF forming one main data cube and one specialised cube for each sub-domain. One sub-domain, the Medications specialised cube, has been linked to relevant external vocabularies, such as the Australian Medicines Terminology and the ATC DDD taxonomy and DrugBank terminology. This provides new dimensions on which to query the data that promote the exploration of drug-drug and drug-disease interactions. CONCLUSIONS: This implementation highlights the effectiveness of the association of the semantic statistics vocabularies for the publication of large heterogeneous data sets as linked data and the integration of the semantic statistics vocabularies with the CDISC standards. In particular, it demonstrates the potential of the two vocabularies in overcoming the monolithic nature of the underlying model and improving the navigation and querying of the data from multiple angles to support richer data analysis of clinical study data. The forecasted benefits are more efficient use of clinicians’ time and the potential to facilitate cross-study analysis.
format Online
Article
Text
id pubmed-4429421
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44294212015-05-14 Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies Leroux, Hugo Lefort, Laurent J Biomed Semantics Research Article BACKGROUND: There is an increasing recognition of the need for the data capture phase of clinical studies to be improved and for more effective sharing of clinical data. The Health Care and Life Sciences community has embraced semantic technologies to facilitate the integration of health data from electronic health records, clinical studies and pharmaceutical research. This paper explores the integration of clinical study data exchange standards and semantic statistic vocabularies to deliver clinical data as linked data in a format that is easier to enrich with links to complementary data sources and consume by a broad user base. METHODS: We propose a Linked Clinical Data Cube (LCDC), which combines the strength of the RDF Data Cube and DDI-RDF vocabulary to enrich clinical data based on the CDISC standards. The CDISC standards provide the mechanisms for the data to be standardised, made more accessible and accountable whereas the RDF Data Cube and DDI-RDF vocabularies provide novel approaches to managing large volumes of heterogeneous linked data resources. RESULTS: We validate our approach using a large-scale longitudinal clinical study into neurodegenerative diseases. This dataset, comprising more than 1600 variables clustered in 25 different sub-domains, has been fully converted into RDF forming one main data cube and one specialised cube for each sub-domain. One sub-domain, the Medications specialised cube, has been linked to relevant external vocabularies, such as the Australian Medicines Terminology and the ATC DDD taxonomy and DrugBank terminology. This provides new dimensions on which to query the data that promote the exploration of drug-drug and drug-disease interactions. CONCLUSIONS: This implementation highlights the effectiveness of the association of the semantic statistics vocabularies for the publication of large heterogeneous data sets as linked data and the integration of the semantic statistics vocabularies with the CDISC standards. In particular, it demonstrates the potential of the two vocabularies in overcoming the monolithic nature of the underlying model and improving the navigation and querying of the data from multiple angles to support richer data analysis of clinical study data. The forecasted benefits are more efficient use of clinicians’ time and the potential to facilitate cross-study analysis. BioMed Central 2015-04-09 /pmc/articles/PMC4429421/ /pubmed/25973166 http://dx.doi.org/10.1186/s13326-015-0012-6 Text en © Leroux and Lefort; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Leroux, Hugo
Lefort, Laurent
Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title_full Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title_fullStr Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title_full_unstemmed Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title_short Semantic enrichment of longitudinal clinical study data using the CDISC standards and the semantic statistics vocabularies
title_sort semantic enrichment of longitudinal clinical study data using the cdisc standards and the semantic statistics vocabularies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429421/
https://www.ncbi.nlm.nih.gov/pubmed/25973166
http://dx.doi.org/10.1186/s13326-015-0012-6
work_keys_str_mv AT lerouxhugo semanticenrichmentoflongitudinalclinicalstudydatausingthecdiscstandardsandthesemanticstatisticsvocabularies
AT lefortlaurent semanticenrichmentoflongitudinalclinicalstudydatausingthecdiscstandardsandthesemanticstatisticsvocabularies