Cargando…

Combining data augmentation and domain information with TENER model for Clinical Event Detection

BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applyi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Zhichang, Liu, Dan, Zhang, Minyu, Qin, Xiaohui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596895/ https://www.ncbi.nlm.nih.gov/pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3

_version_	1784600492052054016
author	Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui
author_facet	Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui
author_sort	Zhang, Zhichang
collection	PubMed
description	BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. METHODS: This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. RESULTS: We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. CONCLUSIONS: This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness.
format	Online Article Text
id	pubmed-8596895
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-85968952021-11-17 Combining data augmentation and domain information with TENER model for Clinical Event Detection Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui BMC Med Inform Decis Mak Research BACKGROUND: In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance. METHODS: This paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection. RESULTS: We use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches. CONCLUSIONS: This paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness. BioMed Central 2021-11-16 /pmc/articles/PMC8596895/ /pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Zhang, Zhichang Liu, Dan Zhang, Minyu Qin, Xiaohui Combining data augmentation and domain information with TENER model for Clinical Event Detection
title	Combining data augmentation and domain information with TENER model for Clinical Event Detection
title_full	Combining data augmentation and domain information with TENER model for Clinical Event Detection
title_fullStr	Combining data augmentation and domain information with TENER model for Clinical Event Detection
title_full_unstemmed	Combining data augmentation and domain information with TENER model for Clinical Event Detection
title_short	Combining data augmentation and domain information with TENER model for Clinical Event Detection
title_sort	combining data augmentation and domain information with tener model for clinical event detection
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596895/ https://www.ncbi.nlm.nih.gov/pubmed/34789246 http://dx.doi.org/10.1186/s12911-021-01618-3
work_keys_str_mv	AT zhangzhichang combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT liudan combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT zhangminyu combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection AT qinxiaohui combiningdataaugmentationanddomaininformationwithtenermodelforclinicaleventdetection

Combining data augmentation and domain information with TENER model for Clinical Event Detection

Ejemplares similares