Cargando…

Determining similarity of scientific entities in annotation datasets

Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness...

Descripción completa

Detalles Bibliográficos
Autores principales:	Palma, Guillermo, Vidal, Maria-Esther, Haag, Eric, Raschid, Louiqa, Thor, Andreas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4343076/ https://www.ncbi.nlm.nih.gov/pubmed/25725057 http://dx.doi.org/10.1093/database/bau123

_version_	1782359358228660224
author	Palma, Guillermo Vidal, Maria-Esther Haag, Eric Raschid, Louiqa Thor, Andreas
author_facet	Palma, Guillermo Vidal, Maria-Esther Haag, Eric Raschid, Louiqa Thor, Andreas
author_sort	Palma, Guillermo
collection	PubMed
description	Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/
format	Online Article Text
id	pubmed-4343076
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-43430762015-03-17 Determining similarity of scientific entities in annotation datasets Palma, Guillermo Vidal, Maria-Esther Haag, Eric Raschid, Louiqa Thor, Andreas Database (Oxford) Original Article Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building block for pattern mining, e.g. identifying drug–drug relationships may depend on the similarity of the targets that interact with each drug. A diversity of similarity measures has been proposed in the literature to compute relatedness between a pair of entities. Each measure exploits some knowledge including the name, function, relationships with other entities, taxonomic neighborhood and semantic knowledge. We propose a novel general-purpose annotation similarity measure called ‘AnnSim’ that measures the relatedness between two entities based on the similarity of their annotations. We model AnnSim as a 1–1 maximum weight bipartite match and exploit properties of existing solvers to provide an efficient solution. We empirically study the performance of AnnSim on real-world datasets of drugs and disease associations from clinical trials and relationships between drugs and (genomic) targets. Using baselines that include a variety of measures, we identify where AnnSim can provide a deeper understanding of the semantics underlying the relatedness of a pair of entities or where it could lead to predicting new links or identifying potential novel patterns. Although AnnSim does not exploit knowledge or properties of a particular domain, its performance compares well with a variety of state-of-the-art domain-specific measures. Database URL: http://www.yeastgenome.org/ Oxford University Press 2015-02-27 /pmc/articles/PMC4343076/ /pubmed/25725057 http://dx.doi.org/10.1093/database/bau123 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Palma, Guillermo Vidal, Maria-Esther Haag, Eric Raschid, Louiqa Thor, Andreas Determining similarity of scientific entities in annotation datasets
title	Determining similarity of scientific entities in annotation datasets
title_full	Determining similarity of scientific entities in annotation datasets
title_fullStr	Determining similarity of scientific entities in annotation datasets
title_full_unstemmed	Determining similarity of scientific entities in annotation datasets
title_short	Determining similarity of scientific entities in annotation datasets
title_sort	determining similarity of scientific entities in annotation datasets
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4343076/ https://www.ncbi.nlm.nih.gov/pubmed/25725057 http://dx.doi.org/10.1093/database/bau123
work_keys_str_mv	AT palmaguillermo determiningsimilarityofscientificentitiesinannotationdatasets AT vidalmariaesther determiningsimilarityofscientificentitiesinannotationdatasets AT haageric determiningsimilarityofscientificentitiesinannotationdatasets AT raschidlouiqa determiningsimilarityofscientificentitiesinannotationdatasets AT thorandreas determiningsimilarityofscientificentitiesinannotationdatasets

Determining similarity of scientific entities in annotation datasets

Ejemplares similares