Cargando…

Federated ontology-based queries over cancer data

BACKGROUND: Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised...

Descripción completa

Detalles Bibliográficos
Autores principales: González-Beltrán, Alejandra, Tagger, Ben, Finkelstein, Anthony
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471355/
https://www.ncbi.nlm.nih.gov/pubmed/22373043
http://dx.doi.org/10.1186/1471-2105-13-S1-S9
_version_ 1782246413716946944
author González-Beltrán, Alejandra
Tagger, Ben
Finkelstein, Anthony
author_facet González-Beltrán, Alejandra
Tagger, Ben
Finkelstein, Anthony
author_sort González-Beltrán, Alejandra
collection PubMed
description BACKGROUND: Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. RESULTS: Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. CONCLUSIONS: To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures.
format Online
Article
Text
id pubmed-3471355
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34713552012-10-18 Federated ontology-based queries over cancer data González-Beltrán, Alejandra Tagger, Ben Finkelstein, Anthony BMC Bioinformatics Research BACKGROUND: Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult. RESULTS: Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included. CONCLUSIONS: To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures. BioMed Central 2012-01-25 /pmc/articles/PMC3471355/ /pubmed/22373043 http://dx.doi.org/10.1186/1471-2105-13-S1-S9 Text en Copyright ©2012 González-Beltrán et al. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
González-Beltrán, Alejandra
Tagger, Ben
Finkelstein, Anthony
Federated ontology-based queries over cancer data
title Federated ontology-based queries over cancer data
title_full Federated ontology-based queries over cancer data
title_fullStr Federated ontology-based queries over cancer data
title_full_unstemmed Federated ontology-based queries over cancer data
title_short Federated ontology-based queries over cancer data
title_sort federated ontology-based queries over cancer data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471355/
https://www.ncbi.nlm.nih.gov/pubmed/22373043
http://dx.doi.org/10.1186/1471-2105-13-S1-S9
work_keys_str_mv AT gonzalezbeltranalejandra federatedontologybasedqueriesovercancerdata
AT taggerben federatedontologybasedqueriesovercancerdata
AT finkelsteinanthony federatedontologybasedqueriesovercancerdata