Cargando…

SemNet: Using Local Features to Navigate the Biomedical Concept Graph

Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to conver...

Descripción completa

Detalles Bibliográficos
Autores principales: Sedler, Andrew R., Mitchell, Cassie S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616276/
https://www.ncbi.nlm.nih.gov/pubmed/31334227
http://dx.doi.org/10.3389/fbioe.2019.00156
_version_ 1783433475234201600
author Sedler, Andrew R.
Mitchell, Cassie S.
author_facet Sedler, Andrew R.
Mitchell, Cassie S.
author_sort Sedler, Andrew R.
collection PubMed
description Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to convert a database of semantic predications of all of PubMed's 27.9 million indexed abstracts into a semantic inference network and biomedical concept graph in Neo4j. The developed software, called SemNet, queries a modified version of the publicly available SemMedDB and computes feature vectors on source-target pairs. Each unique United Medical Language System (UMLS) concept is represented as a node and each predication as an edge. Each node is assigned one of 132 node labels (e.g., Amino Acid, Peptide, or Protein (AAPP); Gene or Genome (GG); etc.) and each edge is labeled with one of 58 predications (e.g. treats, causes, inhibits, etc.). SemNet computes a single feature value for each metapath, or sequence of node types, between a source node and user-specified target node(s). Several different types of metapath-based features (count, degree weighted path count, and HeteSim metric) are computed and vectorized. SemNet employs an unsupervised learning algorithm for rank aggregation (ULARA) to rank identified source nodes that are most relevant to the user-specified target nodes(s). Statistical analysis of correlation among identified source nodes or resultant literature network features are used to identify patterns that can guide future research. Analysis of high residual nodes is used to compare and contrast SemNet rankings between different targets of interest. An example SemNet use case is presented to assess “the differential impact of smoking on cognition in males and females” using the following target nodes: nicotine, learning, memory, tetrahydrocannabinol (THC), cigarette smoke, X chromosome, and Y chromosome. Detailed rankings are discussed. Overall results suggest a hypothesis where smoking negatively impacts cognition to a greater extent in females, but smoking has stronger cardiovascular impacts in males. In summary, SemNet provides an adoptable method for efficient LBD of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care.
format Online
Article
Text
id pubmed-6616276
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-66162762019-07-22 SemNet: Using Local Features to Navigate the Biomedical Concept Graph Sedler, Andrew R. Mitchell, Cassie S. Front Bioeng Biotechnol Bioengineering and Biotechnology Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to convert a database of semantic predications of all of PubMed's 27.9 million indexed abstracts into a semantic inference network and biomedical concept graph in Neo4j. The developed software, called SemNet, queries a modified version of the publicly available SemMedDB and computes feature vectors on source-target pairs. Each unique United Medical Language System (UMLS) concept is represented as a node and each predication as an edge. Each node is assigned one of 132 node labels (e.g., Amino Acid, Peptide, or Protein (AAPP); Gene or Genome (GG); etc.) and each edge is labeled with one of 58 predications (e.g. treats, causes, inhibits, etc.). SemNet computes a single feature value for each metapath, or sequence of node types, between a source node and user-specified target node(s). Several different types of metapath-based features (count, degree weighted path count, and HeteSim metric) are computed and vectorized. SemNet employs an unsupervised learning algorithm for rank aggregation (ULARA) to rank identified source nodes that are most relevant to the user-specified target nodes(s). Statistical analysis of correlation among identified source nodes or resultant literature network features are used to identify patterns that can guide future research. Analysis of high residual nodes is used to compare and contrast SemNet rankings between different targets of interest. An example SemNet use case is presented to assess “the differential impact of smoking on cognition in males and females” using the following target nodes: nicotine, learning, memory, tetrahydrocannabinol (THC), cigarette smoke, X chromosome, and Y chromosome. Detailed rankings are discussed. Overall results suggest a hypothesis where smoking negatively impacts cognition to a greater extent in females, but smoking has stronger cardiovascular impacts in males. In summary, SemNet provides an adoptable method for efficient LBD of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care. Frontiers Media S.A. 2019-07-03 /pmc/articles/PMC6616276/ /pubmed/31334227 http://dx.doi.org/10.3389/fbioe.2019.00156 Text en Copyright © 2019 Sedler and Mitchell. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Sedler, Andrew R.
Mitchell, Cassie S.
SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title_full SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title_fullStr SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title_full_unstemmed SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title_short SemNet: Using Local Features to Navigate the Biomedical Concept Graph
title_sort semnet: using local features to navigate the biomedical concept graph
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616276/
https://www.ncbi.nlm.nih.gov/pubmed/31334227
http://dx.doi.org/10.3389/fbioe.2019.00156
work_keys_str_mv AT sedlerandrewr semnetusinglocalfeaturestonavigatethebiomedicalconceptgraph
AT mitchellcassies semnetusinglocalfeaturestonavigatethebiomedicalconceptgraph