Cargando…
Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts
In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotati...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926422/ https://www.ncbi.nlm.nih.gov/pubmed/36788504 http://dx.doi.org/10.1186/s12911-023-02127-1 |
_version_ | 1784888276828553216 |
---|---|
author | Ye, Qi Cai, Tingting Ji, Xiang Ruan, Tong Zheng, Hong |
author_facet | Ye, Qi Cai, Tingting Ji, Xiang Ruan, Tong Zheng, Hong |
author_sort | Ye, Qi |
collection | PubMed |
description | In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%. |
format | Online Article Text |
id | pubmed-9926422 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-99264222023-02-14 Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts Ye, Qi Cai, Tingting Ji, Xiang Ruan, Tong Zheng, Hong BMC Med Inform Decis Mak Research In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%. BioMed Central 2023-02-14 /pmc/articles/PMC9926422/ /pubmed/36788504 http://dx.doi.org/10.1186/s12911-023-02127-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ye, Qi Cai, Tingting Ji, Xiang Ruan, Tong Zheng, Hong Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title | Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title_full | Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title_fullStr | Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title_full_unstemmed | Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title_short | Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts |
title_sort | subsequence and distant supervision based active learning for relation extraction of chinese medical texts |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9926422/ https://www.ncbi.nlm.nih.gov/pubmed/36788504 http://dx.doi.org/10.1186/s12911-023-02127-1 |
work_keys_str_mv | AT yeqi subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts AT caitingting subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts AT jixiang subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts AT ruantong subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts AT zhenghong subsequenceanddistantsupervisionbasedactivelearningforrelationextractionofchinesemedicaltexts |