Cargando…

BED: a Biological Entity Dictionary based on a graph data model

The understanding of molecular processes involved in a specific biological system can be significantly improved by combining and comparing different data sets and knowledge resources. However, these information sources often use different identification systems and an identifier conversion step is r...

Descripción completa

Detalles Bibliográficos
Autores principales: Godard, Patrice, van Eyll, Jonathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039941/
https://www.ncbi.nlm.nih.gov/pubmed/30026924
http://dx.doi.org/10.12688/f1000research.13925.3
_version_ 1783338769794990080
author Godard, Patrice
van Eyll, Jonathan
author_facet Godard, Patrice
van Eyll, Jonathan
author_sort Godard, Patrice
collection PubMed
description The understanding of molecular processes involved in a specific biological system can be significantly improved by combining and comparing different data sets and knowledge resources. However, these information sources often use different identification systems and an identifier conversion step is required before any integration effort. Mapping between identifiers is often provided by the reference information resources and several tools have been implemented to simplify their use. However, most of these tools do not combine the information provided by individual resources to increase the completeness of the mapping process. Also, deprecated identifiers from former versions of databases are not taken into account. Finally, finding automatically the most relevant path to map identifiers from one scope to the other is often not trivial. The Biological Entity Dictionary (BED) addresses these three challenges by relying on a graph data model describing possible relationships between entities and their identifiers. This model has been implemented using Neo4j and an R package provides functions to query the graph but also to create and feed a custom instance of the database. This design combined with a local installation of the graph database and a cache system make BED very efficient to convert large lists of identifiers.
format Online
Article
Text
id pubmed-6039941
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-60399412018-07-18 BED: a Biological Entity Dictionary based on a graph data model Godard, Patrice van Eyll, Jonathan F1000Res Software Tool Article The understanding of molecular processes involved in a specific biological system can be significantly improved by combining and comparing different data sets and knowledge resources. However, these information sources often use different identification systems and an identifier conversion step is required before any integration effort. Mapping between identifiers is often provided by the reference information resources and several tools have been implemented to simplify their use. However, most of these tools do not combine the information provided by individual resources to increase the completeness of the mapping process. Also, deprecated identifiers from former versions of databases are not taken into account. Finally, finding automatically the most relevant path to map identifiers from one scope to the other is often not trivial. The Biological Entity Dictionary (BED) addresses these three challenges by relying on a graph data model describing possible relationships between entities and their identifiers. This model has been implemented using Neo4j and an R package provides functions to query the graph but also to create and feed a custom instance of the database. This design combined with a local installation of the graph database and a cache system make BED very efficient to convert large lists of identifiers. F1000 Research Limited 2018-07-19 /pmc/articles/PMC6039941/ /pubmed/30026924 http://dx.doi.org/10.12688/f1000research.13925.3 Text en Copyright: © 2018 Godard P and van Eyll J http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Godard, Patrice
van Eyll, Jonathan
BED: a Biological Entity Dictionary based on a graph data model
title BED: a Biological Entity Dictionary based on a graph data model
title_full BED: a Biological Entity Dictionary based on a graph data model
title_fullStr BED: a Biological Entity Dictionary based on a graph data model
title_full_unstemmed BED: a Biological Entity Dictionary based on a graph data model
title_short BED: a Biological Entity Dictionary based on a graph data model
title_sort bed: a biological entity dictionary based on a graph data model
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039941/
https://www.ncbi.nlm.nih.gov/pubmed/30026924
http://dx.doi.org/10.12688/f1000research.13925.3
work_keys_str_mv AT godardpatrice bedabiologicalentitydictionarybasedonagraphdatamodel
AT vaneylljonathan bedabiologicalentitydictionarybasedonagraphdatamodel