Cargando…

Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)

The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the con...

Descripción completa

Detalles Bibliográficos
Autores principales: Munarko, Yuda, Rampadarath, Anand, Nickerson, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10570691/
https://www.ncbi.nlm.nih.gov/pubmed/37842339
http://dx.doi.org/10.12688/f1000research.128982.1
_version_ 1785119825538842624
author Munarko, Yuda
Rampadarath, Anand
Nickerson, David
author_facet Munarko, Yuda
Rampadarath, Anand
Nickerson, David
author_sort Munarko, Yuda
collection PubMed
description The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE).
format Online
Article
Text
id pubmed-10570691
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-105706912023-10-14 Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE) Munarko, Yuda Rampadarath, Anand Nickerson, David F1000Res Software Tool Article The Transformer-based approaches to solving natural language processing (NLP) tasks such as BERT and GPT are gaining popularity due to their ability to achieve high performance. These approaches benefit from using enormous data sizes to create pre-trained models and the ability to understand the context of words in a sentence. Their use in the information retrieval domain is thought to increase effectiveness and efficiency. This paper demonstrates a BERT-based method (CASBERT) implementation to build a search tool over data annotated compositely using ontologies. The data was a collection of biosimulation models written using the CellML standard in the Physiome Model Repository (PMR). A biosimulation model structurally consists of basic entities of constants and variables that construct higher-level entities such as components, reactions, and the model. Finding these entities specific to their level is beneficial for various purposes regarding variable reuse, experiment setup, and model audit. Initially, we created embeddings representing compositely-annotated entities for constant and variable search (lowest level entity). Then, these low-level entity embeddings were vertically and efficiently combined to create higher-level entity embeddings to search components, models, images, and simulation setups. Our approach was general, so it can be used to create search tools with other data semantically annotated with ontologies - biosimulation models encoded in the SBML format, for example. Our tool is named Biosimulation Model Search Engine (BMSE). F1000 Research Limited 2023-02-10 /pmc/articles/PMC10570691/ /pubmed/37842339 http://dx.doi.org/10.12688/f1000research.128982.1 Text en Copyright: © 2023 Munarko Y et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Munarko, Yuda
Rampadarath, Anand
Nickerson, David
Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title_full Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title_fullStr Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title_full_unstemmed Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title_short Building a search tool for compositely annotated entities using Transformer-based approach: Case study in Biosimulation Model Search Engine (BMSE)
title_sort building a search tool for compositely annotated entities using transformer-based approach: case study in biosimulation model search engine (bmse)
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10570691/
https://www.ncbi.nlm.nih.gov/pubmed/37842339
http://dx.doi.org/10.12688/f1000research.128982.1
work_keys_str_mv AT munarkoyuda buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmse
AT rampadarathanand buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmse
AT nickersondavid buildingasearchtoolforcompositelyannotatedentitiesusingtransformerbasedapproachcasestudyinbiosimulationmodelsearchenginebmse