Cargando…

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

OBJECTIVE: Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, su...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lai, Po-Ting, Wei, Chih-Hsuan, Luo, Ling, Chen, Qingyu, Lu, Zhiyong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cornell University 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370213/ https://www.ncbi.nlm.nih.gov/pubmed/37502629

_version_	1785077904659447808
author	Lai, Po-Ting Wei, Chih-Hsuan Luo, Ling Chen, Qingyu Lu, Zhiyong
author_facet	Lai, Po-Ting Wei, Chih-Hsuan Luo, Ling Chen, Qingyu Lu, Zhiyong
author_sort	Lai, Po-Ting
collection	PubMed
description	OBJECTIVE: Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. METHODS: In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. RESULTS AND CONCLUSION: Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx’s robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.
format	Online Article Text
id	pubmed-10370213
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cornell University
record_format	MEDLINE/PubMed
spelling	pubmed-103702132023-07-27 BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets Lai, Po-Ting Wei, Chih-Hsuan Luo, Ling Chen, Qingyu Lu, Zhiyong ArXiv Article OBJECTIVE: Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. METHODS: In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. RESULTS AND CONCLUSION: Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx’s robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx. Cornell University 2023-06-19 /pmc/articles/PMC10370213/ /pubmed/37502629 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Lai, Po-Ting Wei, Chih-Hsuan Luo, Ling Chen, Qingyu Lu, Zhiyong BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title	BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title_full	BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title_fullStr	BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title_full_unstemmed	BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title_short	BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
title_sort	biorex: improving biomedical relation extraction by leveraging heterogeneous datasets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370213/ https://www.ncbi.nlm.nih.gov/pubmed/37502629
work_keys_str_mv	AT laipoting bioreximprovingbiomedicalrelationextractionbyleveragingheterogeneousdatasets AT weichihhsuan bioreximprovingbiomedicalrelationextractionbyleveragingheterogeneousdatasets AT luoling bioreximprovingbiomedicalrelationextractionbyleveragingheterogeneousdatasets AT chenqingyu bioreximprovingbiomedicalrelationextractionbyleveragingheterogeneousdatasets AT luzhiyong bioreximprovingbiomedicalrelationextractionbyleveragingheterogeneousdatasets

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets

Ejemplares similares