Cargando…
Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and unders...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619245/ https://www.ncbi.nlm.nih.gov/pubmed/37915001 http://dx.doi.org/10.1186/s12859-023-05539-y |
_version_ | 1785129945441239040 |
---|---|
author | Millikin, Robert J. Raja, Kalpana Steill, John Lock, Cannon Tu, Xuancheng Ross, Ian Tsoi, Lam C. Kuusisto, Finn Ni, Zijian Livny, Miron Bockelman, Brian Thomson, James Stewart, Ron |
author_facet | Millikin, Robert J. Raja, Kalpana Steill, John Lock, Cannon Tu, Xuancheng Ross, Ian Tsoi, Lam C. Kuusisto, Finn Ni, Zijian Livny, Miron Bockelman, Brian Thomson, James Stewart, Ron |
author_sort | Millikin, Robert J. |
collection | PubMed |
description | BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM’s ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (https://skim.morgridge.org) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05539-y. |
format | Online Article Text |
id | pubmed-10619245 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106192452023-11-02 Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models Millikin, Robert J. Raja, Kalpana Steill, John Lock, Cannon Tu, Xuancheng Ross, Ian Tsoi, Lam C. Kuusisto, Finn Ni, Zijian Livny, Miron Bockelman, Brian Thomson, James Stewart, Ron BMC Bioinformatics Software BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM’s ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (https://skim.morgridge.org) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05539-y. BioMed Central 2023-11-01 /pmc/articles/PMC10619245/ /pubmed/37915001 http://dx.doi.org/10.1186/s12859-023-05539-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Millikin, Robert J. Raja, Kalpana Steill, John Lock, Cannon Tu, Xuancheng Ross, Ian Tsoi, Lam C. Kuusisto, Finn Ni, Zijian Livny, Miron Bockelman, Brian Thomson, James Stewart, Ron Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title | Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title_full | Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title_fullStr | Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title_full_unstemmed | Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title_short | Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
title_sort | serial kinderminer (skim) discovers and annotates biomedical knowledge using co-occurrence and transformer models |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619245/ https://www.ncbi.nlm.nih.gov/pubmed/37915001 http://dx.doi.org/10.1186/s12859-023-05539-y |
work_keys_str_mv | AT millikinrobertj serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT rajakalpana serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT steilljohn serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT lockcannon serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT tuxuancheng serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT rossian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT tsoilamc serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT kuusistofinn serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT nizijian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT livnymiron serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT bockelmanbrian serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT thomsonjames serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels AT stewartron serialkinderminerskimdiscoversandannotatesbiomedicalknowledgeusingcooccurrenceandtransformermodels |