Cargando…

Generation of open biomedical datasets through ontology-driven transformation and integration processes

BACKGROUND: Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating con...

Descripción completa

Detalles Bibliográficos
Autores principales: Carmen Legaz-García, María del, Miñarro-Giménez, José Antonio, Menárguez-Tortosa, Marcos, Fernández-Breis, Jesualdo Tomás
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4891880/
https://www.ncbi.nlm.nih.gov/pubmed/27255189
http://dx.doi.org/10.1186/s13326-016-0075-z
_version_ 1782435340632457216
author Carmen Legaz-García, María del
Miñarro-Giménez, José Antonio
Menárguez-Tortosa, Marcos
Fernández-Breis, Jesualdo Tomás
author_facet Carmen Legaz-García, María del
Miñarro-Giménez, José Antonio
Menárguez-Tortosa, Marcos
Fernández-Breis, Jesualdo Tomás
author_sort Carmen Legaz-García, María del
collection PubMed
description BACKGROUND: Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. METHODS: We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. RESULTS: The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. CONCLUSIONS: We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.
format Online
Article
Text
id pubmed-4891880
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48918802016-06-04 Generation of open biomedical datasets through ontology-driven transformation and integration processes Carmen Legaz-García, María del Miñarro-Giménez, José Antonio Menárguez-Tortosa, Marcos Fernández-Breis, Jesualdo Tomás J Biomed Semantics Research BACKGROUND: Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. METHODS: We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. RESULTS: The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. CONCLUSIONS: We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets. BioMed Central 2016-06-03 /pmc/articles/PMC4891880/ /pubmed/27255189 http://dx.doi.org/10.1186/s13326-016-0075-z Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Carmen Legaz-García, María del
Miñarro-Giménez, José Antonio
Menárguez-Tortosa, Marcos
Fernández-Breis, Jesualdo Tomás
Generation of open biomedical datasets through ontology-driven transformation and integration processes
title Generation of open biomedical datasets through ontology-driven transformation and integration processes
title_full Generation of open biomedical datasets through ontology-driven transformation and integration processes
title_fullStr Generation of open biomedical datasets through ontology-driven transformation and integration processes
title_full_unstemmed Generation of open biomedical datasets through ontology-driven transformation and integration processes
title_short Generation of open biomedical datasets through ontology-driven transformation and integration processes
title_sort generation of open biomedical datasets through ontology-driven transformation and integration processes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4891880/
https://www.ncbi.nlm.nih.gov/pubmed/27255189
http://dx.doi.org/10.1186/s13326-016-0075-z
work_keys_str_mv AT carmenlegazgarciamariadel generationofopenbiomedicaldatasetsthroughontologydriventransformationandintegrationprocesses
AT minarrogimenezjoseantonio generationofopenbiomedicaldatasetsthroughontologydriventransformationandintegrationprocesses
AT menargueztortosamarcos generationofopenbiomedicaldatasetsthroughontologydriventransformationandintegrationprocesses
AT fernandezbreisjesualdotomas generationofopenbiomedicaldatasetsthroughontologydriventransformationandintegrationprocesses