Cargando…

Towards linked open gene mutations data

BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exis...

Descripción completa

Detalles Bibliográficos
Autores principales: Zappa, Achille, Splendiani, Andrea, Romano, Paolo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303732/
https://www.ncbi.nlm.nih.gov/pubmed/22536974
http://dx.doi.org/10.1186/1471-2105-13-S4-S7
_version_ 1782226780493447168
author Zappa, Achille
Splendiani, Andrea
Romano, Paolo
author_facet Zappa, Achille
Splendiani, Andrea
Romano, Paolo
author_sort Zappa, Achille
collection PubMed
description BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. METHODS: A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. RESULTS: We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. CONCLUSIONS: This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
format Online
Article
Text
id pubmed-3303732
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33037322012-03-15 Towards linked open gene mutations data Zappa, Achille Splendiani, Andrea Romano, Paolo BMC Bioinformatics Research BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. METHODS: A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. RESULTS: We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. CONCLUSIONS: This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine. BioMed Central 2012-03-28 /pmc/articles/PMC3303732/ /pubmed/22536974 http://dx.doi.org/10.1186/1471-2105-13-S4-S7 Text en Copyright ©2012 Zappa et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zappa, Achille
Splendiani, Andrea
Romano, Paolo
Towards linked open gene mutations data
title Towards linked open gene mutations data
title_full Towards linked open gene mutations data
title_fullStr Towards linked open gene mutations data
title_full_unstemmed Towards linked open gene mutations data
title_short Towards linked open gene mutations data
title_sort towards linked open gene mutations data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303732/
https://www.ncbi.nlm.nih.gov/pubmed/22536974
http://dx.doi.org/10.1186/1471-2105-13-S4-S7
work_keys_str_mv AT zappaachille towardslinkedopengenemutationsdata
AT splendianiandrea towardslinkedopengenemutationsdata
AT romanopaolo towardslinkedopengenemutationsdata