Cargando…

Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning

MOTIVATION: Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encount...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peng, Wang, Jian, Lin, Hongfei, Zhao, Di, Yang, Zhihao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444965/
https://www.ncbi.nlm.nih.gov/pubmed/37549065
http://dx.doi.org/10.1093/bioinformatics/btad496
_version_ 1785094070416179200
author Chen, Peng
Wang, Jian
Lin, Hongfei
Zhao, Di
Yang, Zhihao
author_facet Chen, Peng
Wang, Jian
Lin, Hongfei
Zhao, Di
Yang, Zhihao
author_sort Chen, Peng
collection PubMed
description MOTIVATION: Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encounters limited human-labeled data in real-world scenarios, leading to poor generalization performance when training only a few labeled instances. Recent approaches either leverage cross-domain high-resource data or fine-tune the pre-trained masked language model using limited labeled samples to generate new synthetic data, which is easily stuck in domain shift problems or yields low-quality synthetic data. Therefore, in this article, we study a more realistic scenario, i.e. few-shot learning for BioNER. RESULTS: Leveraging the domain knowledge graph, we propose knowledge-guided instance generation for few-shot BioNER, which generates diverse and novel entities based on similar semantic relations of neighbor nodes. In addition, by introducing question prompt, we cast BioNER as question-answering task and propose prompt contrastive learning to improve the robustness of the model by measuring the mutual information between query–answer pairs. Extensive experiments conducted on various few-shot settings show that the proposed framework achieves superior performance. Particularly, in a low-resource scenario with only 20 samples, our approach substantially outperforms recent state-of-the-art models on four benchmark datasets, achieving an average improvement of up to 7.1% F1. AVAILABILITY AND IMPLEMENTATION: Our source code and data are available at https://github.com/cpmss521/KGPC.
format Online
Article
Text
id pubmed-10444965
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104449652023-08-24 Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning Chen, Peng Wang, Jian Lin, Hongfei Zhao, Di Yang, Zhihao Bioinformatics Original Paper MOTIVATION: Few-shot learning that can effectively perform named entity recognition in low-resource scenarios has raised growing attention, but it has not been widely studied yet in the biomedical field. In contrast to high-resource domains, biomedical named entity recognition (BioNER) often encounters limited human-labeled data in real-world scenarios, leading to poor generalization performance when training only a few labeled instances. Recent approaches either leverage cross-domain high-resource data or fine-tune the pre-trained masked language model using limited labeled samples to generate new synthetic data, which is easily stuck in domain shift problems or yields low-quality synthetic data. Therefore, in this article, we study a more realistic scenario, i.e. few-shot learning for BioNER. RESULTS: Leveraging the domain knowledge graph, we propose knowledge-guided instance generation for few-shot BioNER, which generates diverse and novel entities based on similar semantic relations of neighbor nodes. In addition, by introducing question prompt, we cast BioNER as question-answering task and propose prompt contrastive learning to improve the robustness of the model by measuring the mutual information between query–answer pairs. Extensive experiments conducted on various few-shot settings show that the proposed framework achieves superior performance. Particularly, in a low-resource scenario with only 20 samples, our approach substantially outperforms recent state-of-the-art models on four benchmark datasets, achieving an average improvement of up to 7.1% F1. AVAILABILITY AND IMPLEMENTATION: Our source code and data are available at https://github.com/cpmss521/KGPC. Oxford University Press 2023-08-07 /pmc/articles/PMC10444965/ /pubmed/37549065 http://dx.doi.org/10.1093/bioinformatics/btad496 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Chen, Peng
Wang, Jian
Lin, Hongfei
Zhao, Di
Yang, Zhihao
Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title_full Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title_fullStr Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title_full_unstemmed Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title_short Few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
title_sort few-shot biomedical named entity recognition via knowledge-guided instance generation and prompt contrastive learning
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10444965/
https://www.ncbi.nlm.nih.gov/pubmed/37549065
http://dx.doi.org/10.1093/bioinformatics/btad496
work_keys_str_mv AT chenpeng fewshotbiomedicalnamedentityrecognitionviaknowledgeguidedinstancegenerationandpromptcontrastivelearning
AT wangjian fewshotbiomedicalnamedentityrecognitionviaknowledgeguidedinstancegenerationandpromptcontrastivelearning
AT linhongfei fewshotbiomedicalnamedentityrecognitionviaknowledgeguidedinstancegenerationandpromptcontrastivelearning
AT zhaodi fewshotbiomedicalnamedentityrecognitionviaknowledgeguidedinstancegenerationandpromptcontrastivelearning
AT yangzhihao fewshotbiomedicalnamedentityrecognitionviaknowledgeguidedinstancegenerationandpromptcontrastivelearning