Cargando…
Entity relation extraction in the medical domain: based on data augmentation
BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relati...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
AME Publishing Company
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9622485/ https://www.ncbi.nlm.nih.gov/pubmed/36330405 http://dx.doi.org/10.21037/atm-22-3991 |
_version_ | 1784821779787677696 |
---|---|
author | Wang, Anli Li, Linyi Wu, Xuehong Zhu, Jianping Yu, Shanshan Chen, Xi Li, Jianhua Zhu, Hongtao |
author_facet | Wang, Anli Li, Linyi Wu, Xuehong Zhu, Jianping Yu, Shanshan Chen, Xi Li, Jianhua Zhu, Hongtao |
author_sort | Wang, Anli |
collection | PubMed |
description | BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value. METHODS: In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, Pharmacopoeia of the People’s Republic of China, and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model. RESULTS: The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied. CONCLUSIONS: For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation. |
format | Online Article Text |
id | pubmed-9622485 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | AME Publishing Company |
record_format | MEDLINE/PubMed |
spelling | pubmed-96224852022-11-02 Entity relation extraction in the medical domain: based on data augmentation Wang, Anli Li, Linyi Wu, Xuehong Zhu, Jianping Yu, Shanshan Chen, Xi Li, Jianhua Zhu, Hongtao Ann Transl Med Original Article BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value. METHODS: In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, Pharmacopoeia of the People’s Republic of China, and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model. RESULTS: The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied. CONCLUSIONS: For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation. AME Publishing Company 2022-10 /pmc/articles/PMC9622485/ /pubmed/36330405 http://dx.doi.org/10.21037/atm-22-3991 Text en 2022 Annals of Translational Medicine. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Original Article Wang, Anli Li, Linyi Wu, Xuehong Zhu, Jianping Yu, Shanshan Chen, Xi Li, Jianhua Zhu, Hongtao Entity relation extraction in the medical domain: based on data augmentation |
title | Entity relation extraction in the medical domain: based on data augmentation |
title_full | Entity relation extraction in the medical domain: based on data augmentation |
title_fullStr | Entity relation extraction in the medical domain: based on data augmentation |
title_full_unstemmed | Entity relation extraction in the medical domain: based on data augmentation |
title_short | Entity relation extraction in the medical domain: based on data augmentation |
title_sort | entity relation extraction in the medical domain: based on data augmentation |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9622485/ https://www.ncbi.nlm.nih.gov/pubmed/36330405 http://dx.doi.org/10.21037/atm-22-3991 |
work_keys_str_mv | AT wanganli entityrelationextractioninthemedicaldomainbasedondataaugmentation AT lilinyi entityrelationextractioninthemedicaldomainbasedondataaugmentation AT wuxuehong entityrelationextractioninthemedicaldomainbasedondataaugmentation AT zhujianping entityrelationextractioninthemedicaldomainbasedondataaugmentation AT yushanshan entityrelationextractioninthemedicaldomainbasedondataaugmentation AT chenxi entityrelationextractioninthemedicaldomainbasedondataaugmentation AT lijianhua entityrelationextractioninthemedicaldomainbasedondataaugmentation AT zhuhongtao entityrelationextractioninthemedicaldomainbasedondataaugmentation |