Cargando…
Combining data augmentation and domain information with TENER model for Clinical Event Detection
BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applyi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596895/ https://www.ncbi.nlm.nih.gov/pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3 |
_version_ | 1784600492052054016 |
---|---|
author | Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui |
author_facet | Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui |
author_sort | Zhang, Zhichang |
collection | PubMed |
description | BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. METHODS: This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. RESULTS: We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. CONCLUSIONS: This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness. |
format | Online Article Text |
id | pubmed-8596895 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-85968952021-11-17 Combining data augmentation and domain information with TENER model for Clinical Event Detection Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui BMC Med Inform Decis Mak Research BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. METHODS: This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. RESULTS: We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. CONCLUSIONS: This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness. BioMed Central 2021-11-16 /pmc/articles/PMC8596895/ /pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title | Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title_full | Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title_fullStr | Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title_full_unstemmed | Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title_short | Combining data augmentation and domain information with TENER model for Clinical Event Detection |
title_sort | combining data augmentation and domain information with tener model for clinical event detection |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596895/ https://www.ncbi.nlm.nih.gov/pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3 |
work_keys_str_mv | AT zhangzhichang combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT liudan combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT zhangminyu combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT qinxiaohui combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection |