Cargando…

Context-Driven Automatic Subgraph Creation for Literature-Based Discovery

BACKGROUND: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: 1) domain expertise and structured background knowledge to manually filter and explore the literature, 2) distributional s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cameron, Delroy, Kavuluru, Ramakanth, Rindflesch, Thomas C., Sheth, Amit P., Thirunarayan, Krishnaprasad, Bodenreider, Olivier
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2015
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888806/ https://www.ncbi.nlm.nih.gov/pubmed/25661592 http://dx.doi.org/10.1016/j.jbi.2015.01.014

_version_	1782434907222441984
author	Cameron, Delroy Kavuluru, Ramakanth Rindflesch, Thomas C. Sheth, Amit P. Thirunarayan, Krishnaprasad Bodenreider, Olivier
author_facet	Cameron, Delroy Kavuluru, Ramakanth Rindflesch, Thomas C. Sheth, Amit P. Thirunarayan, Krishnaprasad Bodenreider, Olivier
author_sort	Cameron, Delroy
collection	PubMed
description	BACKGROUND: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: 1) domain expertise and structured background knowledge to manually filter and explore the literature, 2) distributional statistics and graph-theoretic measures to rank interesting connections, and 3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required. OBJECTIVES: In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts. METHODS: To generate subgraphs, the set of all MEDLINE articles that contain either of the two specified concepts (A, C) are first collected. Then binary relationships or assertions, which are automatically extracted from the MEDLINE articles, called semantic predications, are used to create a labeled directed predications graph. In this predications graph, a path is represented as a sequence of semantic predications. The hierarchical agglomerative clustering (HAC) algorithm is then applied to cluster paths that are bounded by the two concepts (A, C). HAC relies on implicit semantics captured through Medical Subject Heading (MeSH) descriptors, and explicit semantics from the MeSH hierarchy, for clustering. Paths that exceed a threshold of semantic relatedness are clustered into subgraphs based on their shared context. Finally, the automatically generated clusters are provided as a ranked list of subgraphs. RESULTS: The subgraphs generated using this approach facilitated the rediscovery of 8 out of 9 existing scientific discoveries. In particular, they directly (or indirectly) led to the recovery of several intermediates (or B-concepts) between A- and C-terms, while also providing insights into the meaning of the associations. Such meaning is derived from predicates between the concepts, as well as the provenance of the semantic predications in MEDLINE. Additionally, by generating subgraphs on different thematic dimensions (such as Cellular Activity, Pharmaceutical Treatment and Tissue Function), the approach may enable a broader understanding of the nature of complex associations between concepts. Finally, in a statistical evaluation to determine the interestingness of the subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE on average. CONCLUSION: These results suggest that leveraging the implicit and explicit semantics provided by manually assigned MeSH descriptors is an effective representation for capturing the underlying context of complex associations, along multiple thematic dimensions in LBD situations.
format	Online Article Text
id	pubmed-4888806
institution	National Center for Biotechnology Information
language	English
publishDate	2015
record_format	MEDLINE/PubMed
spelling	pubmed-48888062016-06-01 Context-Driven Automatic Subgraph Creation for Literature-Based Discovery Cameron, Delroy Kavuluru, Ramakanth Rindflesch, Thomas C. Sheth, Amit P. Thirunarayan, Krishnaprasad Bodenreider, Olivier J Biomed Inform Article BACKGROUND: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: 1) domain expertise and structured background knowledge to manually filter and explore the literature, 2) distributional statistics and graph-theoretic measures to rank interesting connections, and 3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required. OBJECTIVES: In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts. METHODS: To generate subgraphs, the set of all MEDLINE articles that contain either of the two specified concepts (A, C) are first collected. Then binary relationships or assertions, which are automatically extracted from the MEDLINE articles, called semantic predications, are used to create a labeled directed predications graph. In this predications graph, a path is represented as a sequence of semantic predications. The hierarchical agglomerative clustering (HAC) algorithm is then applied to cluster paths that are bounded by the two concepts (A, C). HAC relies on implicit semantics captured through Medical Subject Heading (MeSH) descriptors, and explicit semantics from the MeSH hierarchy, for clustering. Paths that exceed a threshold of semantic relatedness are clustered into subgraphs based on their shared context. Finally, the automatically generated clusters are provided as a ranked list of subgraphs. RESULTS: The subgraphs generated using this approach facilitated the rediscovery of 8 out of 9 existing scientific discoveries. In particular, they directly (or indirectly) led to the recovery of several intermediates (or B-concepts) between A- and C-terms, while also providing insights into the meaning of the associations. Such meaning is derived from predicates between the concepts, as well as the provenance of the semantic predications in MEDLINE. Additionally, by generating subgraphs on different thematic dimensions (such as Cellular Activity, Pharmaceutical Treatment and Tissue Function), the approach may enable a broader understanding of the nature of complex associations between concepts. Finally, in a statistical evaluation to determine the interestingness of the subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE on average. CONCLUSION: These results suggest that leveraging the implicit and explicit semantics provided by manually assigned MeSH descriptors is an effective representation for capturing the underlying context of complex associations, along multiple thematic dimensions in LBD situations. 2015-02-07 2015-04 /pmc/articles/PMC4888806/ /pubmed/25661592 http://dx.doi.org/10.1016/j.jbi.2015.01.014 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ This manuscript version is made available under the CC BY-NC-ND 4.0 license.
spellingShingle	Article Cameron, Delroy Kavuluru, Ramakanth Rindflesch, Thomas C. Sheth, Amit P. Thirunarayan, Krishnaprasad Bodenreider, Olivier Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title	Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title_full	Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title_fullStr	Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title_full_unstemmed	Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title_short	Context-Driven Automatic Subgraph Creation for Literature-Based Discovery
title_sort	context-driven automatic subgraph creation for literature-based discovery
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888806/ https://www.ncbi.nlm.nih.gov/pubmed/25661592 http://dx.doi.org/10.1016/j.jbi.2015.01.014
work_keys_str_mv	AT camerondelroy contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery AT kavulururamakanth contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery AT rindfleschthomasc contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery AT shethamitp contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery AT thirunarayankrishnaprasad contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery AT bodenreiderolivier contextdrivenautomaticsubgraphcreationforliteraturebaseddiscovery

Context-Driven Automatic Subgraph Creation for Literature-Based Discovery

Ejemplares similares