Cargando…

Towards linked open gene mutations data

BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zappa, Achille, Splendiani, Andrea, Romano, Paolo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303732/ https://www.ncbi.nlm.nih.gov/pubmed/22536974 http://dx.doi.org/10.1186/1471-2105-13-S4-S7

_version_	1782226780493447168
author	Zappa, Achille Splendiani, Andrea Romano, Paolo
author_facet	Zappa, Achille Splendiani, Andrea Romano, Paolo
author_sort	Zappa, Achille
collection	PubMed
description	BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. METHODS: A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. RESULTS: We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. CONCLUSIONS: This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
format	Online Article Text
id	pubmed-3303732
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33037322012-03-15 Towards linked open gene mutations data Zappa, Achille Splendiani, Andrea Romano, Paolo BMC Bioinformatics Research BACKGROUND: With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. METHODS: A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. RESULTS: We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. CONCLUSIONS: This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine. BioMed Central 2012-03-28 /pmc/articles/PMC3303732/ /pubmed/22536974 http://dx.doi.org/10.1186/1471-2105-13-S4-S7 Text en Copyright ©2012 Zappa et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Zappa, Achille Splendiani, Andrea Romano, Paolo Towards linked open gene mutations data
title	Towards linked open gene mutations data
title_full	Towards linked open gene mutations data
title_fullStr	Towards linked open gene mutations data
title_full_unstemmed	Towards linked open gene mutations data
title_short	Towards linked open gene mutations data
title_sort	towards linked open gene mutations data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303732/ https://www.ncbi.nlm.nih.gov/pubmed/22536974 http://dx.doi.org/10.1186/1471-2105-13-S4-S7
work_keys_str_mv	AT zappaachille towardslinkedopengenemutationsdata AT splendianiandrea towardslinkedopengenemutationsdata AT romanopaolo towardslinkedopengenemutationsdata

Towards linked open gene mutations data

Ejemplares similares