Cargando…

Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models

BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and unders...

Descripción completa

Detalles Bibliográficos
Autores principales: Millikin, Robert J., Raja, Kalpana, Steill, John, Lock, Cannon, Tu, Xuancheng, Ross, Ian, Tsoi, Lam C., Kuusisto, Finn, Ni, Zijian, Livny, Miron, Bockelman, Brian, Thomson, James, Stewart, Ron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619245/
https://www.ncbi.nlm.nih.gov/pubmed/37915001
http://dx.doi.org/10.1186/s12859-023-05539-y
_version_ 1785129945441239040
author Millikin, Robert J.
Raja, Kalpana
Steill, John
Lock, Cannon
Tu, Xuancheng
Ross, Ian
Tsoi, Lam C.
Kuusisto, Finn
Ni, Zijian
Livny, Miron
Bockelman, Brian
Thomson, James
Stewart, Ron
author_facet Millikin, Robert J.
Raja, Kalpana
Steill, John
Lock, Cannon
Tu, Xuancheng
Ross, Ian
Tsoi, Lam C.
Kuusisto, Finn
Ni, Zijian
Livny, Miron
Bockelman, Brian
Thomson, James
Stewart, Ron
author_sort Millikin, Robert J.
collection PubMed
description BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM’s ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (https://skim.morgridge.org) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05539-y.
format Online
Article
Text
id pubmed-10619245
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106192452023-11-02 Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models Millikin, Robert J. Raja, Kalpana Steill, John Lock, Cannon Tu, Xuancheng Ross, Ian Tsoi, Lam C. Kuusisto, Finn Ni, Zijian Livny, Miron Bockelman, Brian Thomson, James Stewart, Ron BMC Bioinformatics Software BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM’s ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (https://skim.morgridge.org) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05539-y. BioMed Central 2023-11-01 /pmc/articles/PMC10619245/ /pubmed/37915001 http://dx.doi.org/10.1186/s12859-023-05539-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Millikin, Robert J.
Raja, Kalpana
Steill, John
Lock, Cannon
Tu, Xuancheng
Ross, Ian
Tsoi, Lam C.
Kuusisto, Finn
Ni, Zijian
Livny, Miron
Bockelman, Brian
Thomson, James
Stewart, Ron
Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title_full Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title_fullStr Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title_full_unstemmed Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title_short Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
title_sort serial kinderminer (skim) discovers and annotates biomedical knowledge using co-occurrence and transformer models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619245/
https://www.ncbi.nlm.nih.gov/pubmed/37915001
http://dx.doi.org/10.1186/s12859-023-05539-y
work_keys_str_mv AT millikinrobertj serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT rajakalpana serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT steilljohn serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT lockcannon serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT tuxuancheng serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT rossian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT tsoilamc serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT kuusistofinn serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT nizijian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT livnymiron serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT bockelmanbrian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT thomsonjames serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels
AT stewartron serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels