Cargando…

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNe...

Descripción completa

Detalles Bibliográficos
Autores principales: Kirkpatrick, Anna, Onyeze, Chidozie, Kartchner, David, Allegri, Stephen, An, Davi Nakajima, McCoy, Kevin, Davalbhakta, Evie, Mitchell, Cassie S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351549/
https://www.ncbi.nlm.nih.gov/pubmed/35936510
http://dx.doi.org/10.3390/bdcc6010027
_version_ 1784762467263447040
author Kirkpatrick, Anna
Onyeze, Chidozie
Kartchner, David
Allegri, Stephen
An, Davi Nakajima
McCoy, Kevin
Davalbhakta, Evie
Mitchell, Cassie S.
author_facet Kirkpatrick, Anna
Onyeze, Chidozie
Kartchner, David
Allegri, Stephen
An, Davi Nakajima
McCoy, Kevin
Davalbhakta, Evie
Mitchell, Cassie S.
author_sort Kirkpatrick, Anna
collection PubMed
description Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
format Online
Article
Text
id pubmed-9351549
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-93515492022-08-04 Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0 Kirkpatrick, Anna Onyeze, Chidozie Kartchner, David Allegri, Stephen An, Davi Nakajima McCoy, Kevin Davalbhakta, Evie Mitchell, Cassie S. Big Data Cogn Comput Article Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities. 2022-03 2022-03-01 /pmc/articles/PMC9351549/ /pubmed/35936510 http://dx.doi.org/10.3390/bdcc6010027 Text en https://creativecommons.org/licenses/by/4.0/This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kirkpatrick, Anna
Onyeze, Chidozie
Kartchner, David
Allegri, Stephen
An, Davi Nakajima
McCoy, Kevin
Davalbhakta, Evie
Mitchell, Cassie S.
Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title_full Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title_fullStr Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title_full_unstemmed Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title_short Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
title_sort optimizations for computing relatedness in biomedical heterogeneous information networks: semnet 2.0
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9351549/
https://www.ncbi.nlm.nih.gov/pubmed/35936510
http://dx.doi.org/10.3390/bdcc6010027
work_keys_str_mv AT kirkpatrickanna optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT onyezechidozie optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT kartchnerdavid optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT allegristephen optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT andavinakajima optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT mccoykevin optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT davalbhaktaevie optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20
AT mitchellcassies optimizationsforcomputingrelatednessinbiomedicalheterogeneousinformationnetworkssemnet20