Cargando…

Entity relation extraction in the medical domain: based on data augmentation

BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Anli, Li, Linyi, Wu, Xuehong, Zhu, Jianping, Yu, Shanshan, Chen, Xi, Li, Jianhua, Zhu, Hongtao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AME Publishing Company 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9622485/
https://www.ncbi.nlm.nih.gov/pubmed/36330405
http://dx.doi.org/10.21037/atm-22-3991
_version_ 1784821779787677696
author Wang, Anli
Li, Linyi
Wu, Xuehong
Zhu, Jianping
Yu, Shanshan
Chen, Xi
Li, Jianhua
Zhu, Hongtao
author_facet Wang, Anli
Li, Linyi
Wu, Xuehong
Zhu, Jianping
Yu, Shanshan
Chen, Xi
Li, Jianhua
Zhu, Hongtao
author_sort Wang, Anli
collection PubMed
description BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value. METHODS: In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, Pharmacopoeia of the People’s Republic of China, and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model. RESULTS: The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied. CONCLUSIONS: For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation.
format Online
Article
Text
id pubmed-9622485
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher AME Publishing Company
record_format MEDLINE/PubMed
spelling pubmed-96224852022-11-02 Entity relation extraction in the medical domain: based on data augmentation Wang, Anli Li, Linyi Wu, Xuehong Zhu, Jianping Yu, Shanshan Chen, Xi Li, Jianhua Zhu, Hongtao Ann Transl Med Original Article BACKGROUND: Entity relation extraction is an important task in the construction of professional knowledge graphs in the medical field. Research on entity relation extraction for academic books in the medical field has revealed that there is a great difference in the number of different entity relations, which has led to the formation of a typical unbalanced data set that is difficult to recognize but has certain research value. METHODS: In this article, we propose a new entity relation extraction method based on data augmentation. According to the distribution of individual entity relation classes in the data set, the probability of whether a text is augmented during training was calculated. In text-oriented data augmentation, different augmentation methods perform differently in different language environments. The reinforcement of learning determines which data augmentation method to use in the current language environment. This strategy was applied to the entity relation extraction of the medical professional book, Pharmacopoeia of the People’s Republic of China, and different data augmentation methods (i.e., no data augmentation, traditional data augmentation, and reinforcement learning-based data augmentation) were compared under the same neural network model. RESULTS: The deep-learning model using data augmentation was better than the model without data augmentation, as data augmentation significantly improved the evaluation indicators of the relation classes with low data volumes in the unbalanced data set and slightly improved the evaluation indicators of the relation classes with sufficient features and large data volumes. Additionally, the deep-learning model using reinforcement learning-based data augmentation was superior to the deep-learning model using traditional data augmentation. We found that after the application of reinforcement learning-based data augmentation, the evaluation indicators of the multiple relation classes were much better than those to which reinforcement learning-based data augmentation had not been applied. CONCLUSIONS: For unbalanced data sets, data augmentation can effectively improve the ability of the deep-learning model to obtain data features, and reinforcement learning-based data augmentation can further enhance this ability. Our experiments confirmed the superiority of reinforcement learning-based data augmentation. AME Publishing Company 2022-10 /pmc/articles/PMC9622485/ /pubmed/36330405 http://dx.doi.org/10.21037/atm-22-3991 Text en 2022 Annals of Translational Medicine. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Original Article
Wang, Anli
Li, Linyi
Wu, Xuehong
Zhu, Jianping
Yu, Shanshan
Chen, Xi
Li, Jianhua
Zhu, Hongtao
Entity relation extraction in the medical domain: based on data augmentation
title Entity relation extraction in the medical domain: based on data augmentation
title_full Entity relation extraction in the medical domain: based on data augmentation
title_fullStr Entity relation extraction in the medical domain: based on data augmentation
title_full_unstemmed Entity relation extraction in the medical domain: based on data augmentation
title_short Entity relation extraction in the medical domain: based on data augmentation
title_sort entity relation extraction in the medical domain: based on data augmentation
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9622485/
https://www.ncbi.nlm.nih.gov/pubmed/36330405
http://dx.doi.org/10.21037/atm-22-3991
work_keys_str_mv AT wanganli entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT lilinyi entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT wuxuehong entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT zhujianping entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT yushanshan entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT chenxi entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT lijianhua entityrelationextractioninthemedicaldomainbasedondataaugmentation
AT zhuhongtao entityrelationextractioninthemedicaldomainbasedondataaugmentation