Cargando…

Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts

In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotati...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Qi, Cai, Tingting, Ji, Xiang, Ruan, Tong, Zheng, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926422/
https://www.ncbi.nlm.nih.gov/pubmed/36788504
http://dx.doi.org/10.1186/s12911-023-02127-1
_version_ 1784888276828553216
author Ye, Qi
Cai, Tingting
Ji, Xiang
Ruan, Tong
Zheng, Hong
author_facet Ye, Qi
Cai, Tingting
Ji, Xiang
Ruan, Tong
Zheng, Hong
author_sort Ye, Qi
collection PubMed
description In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.
format Online
Article
Text
id pubmed-9926422
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99264222023-02-14 Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts Ye, Qi Cai, Tingting Ji, Xiang Ruan, Tong Zheng, Hong BMC Med Inform Decis Mak Research In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%. BioMed Central 2023-02-14 /pmc/articles/PMC9926422/ /pubmed/36788504 http://dx.doi.org/10.1186/s12911-023-02127-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ye, Qi
Cai, Tingting
Ji, Xiang
Ruan, Tong
Zheng, Hong
Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title_full Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title_fullStr Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title_full_unstemmed Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title_short Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
title_sort subsequence and distant supervision based active learning for relation extraction of chinese medical texts
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926422/
https://www.ncbi.nlm.nih.gov/pubmed/36788504
http://dx.doi.org/10.1186/s12911-023-02127-1
work_keys_str_mv AT yeqi subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts
AT caitingting subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts
AT jixiang subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts
AT ruantong subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts
AT zhenghong subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts