Cargando…

Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study

BACKGROUND: With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data red...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiong, Ying, Chen, Shuai, Chen, Qingcai, Yan, Jun, Tang, Buzhou
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7803475/ https://www.ncbi.nlm.nih.gov/pubmed/33372664 http://dx.doi.org/10.2196/23357

_version_	1783635945040379904
author	Xiong, Ying Chen, Shuai Chen, Qingcai Yan, Jun Tang, Buzhou
author_facet	Xiong, Ying Chen, Shuai Chen, Qingcai Yan, Jun Tang, Buzhou
author_sort	Xiong, Ying
collection	PubMed
description	BACKGROUND: With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data redundancy in different documents, Harvard Medical School and Mayo Clinic organized a national natural language processing (NLP) clinical challenge (n2c2) on clinical semantic textual similarity (ClinicalSTS) in 2019. The task of this challenge is to compute the semantic similarity among clinical text snippets. OBJECTIVE: In this study, we aim to investigate novel methods to model ClinicalSTS and analyze the results. METHODS: We propose a semantically enhanced text matching model for the 2019 n2c2/Open Health NLP (OHNLP) challenge on ClinicalSTS. The model includes 3 representation modules to encode clinical text snippet pairs at different levels: (1) character-level representation module based on convolutional neural network (CNN) to tackle the out-of-vocabulary problem in NLP; (2) sentence-level representation module that adopts a pretrained language model bidirectional encoder representation from transformers (BERT) to encode clinical text snippet pairs; and (3) entity-level representation module to model clinical entity information in clinical text snippets. In the case of entity-level representation, we compare 2 methods. One encodes entities by the entity-type label sequence corresponding to text snippet (called entity I), whereas the other encodes entities by their representation in MeSH, a knowledge graph in the medical domain (called entity II). RESULTS: We conduct experiments on the ClinicalSTS corpus of the 2019 n2c2/OHNLP challenge for model performance evaluation. The model only using BERT for text snippet pair encoding achieved a Pearson correlation coefficient (PCC) of 0.848. When character-level representation and entity-level representation are individually added into our model, the PCC increased to 0.857 and 0.854 (entity I)/0.859 (entity II), respectively. When both character-level representation and entity-level representation are added into our model, the PCC further increased to 0.861 (entity I) and 0.868 (entity II). CONCLUSIONS: Experimental results show that both character-level information and entity-level information can effectively enhance the BERT-based STS model.
format	Online Article Text
id	pubmed-7803475
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-78034752021-01-15 Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study Xiong, Ying Chen, Shuai Chen, Qingcai Yan, Jun Tang, Buzhou JMIR Med Inform Original Paper BACKGROUND: With the popularity of electronic health records (EHRs), the quality of health care has been improved. However, there are also some problems caused by EHRs, such as the growing use of copy-and-paste and templates, resulting in EHRs of low quality in content. In order to minimize data redundancy in different documents, Harvard Medical School and Mayo Clinic organized a national natural language processing (NLP) clinical challenge (n2c2) on clinical semantic textual similarity (ClinicalSTS) in 2019. The task of this challenge is to compute the semantic similarity among clinical text snippets. OBJECTIVE: In this study, we aim to investigate novel methods to model ClinicalSTS and analyze the results. METHODS: We propose a semantically enhanced text matching model for the 2019 n2c2/Open Health NLP (OHNLP) challenge on ClinicalSTS. The model includes 3 representation modules to encode clinical text snippet pairs at different levels: (1) character-level representation module based on convolutional neural network (CNN) to tackle the out-of-vocabulary problem in NLP; (2) sentence-level representation module that adopts a pretrained language model bidirectional encoder representation from transformers (BERT) to encode clinical text snippet pairs; and (3) entity-level representation module to model clinical entity information in clinical text snippets. In the case of entity-level representation, we compare 2 methods. One encodes entities by the entity-type label sequence corresponding to text snippet (called entity I), whereas the other encodes entities by their representation in MeSH, a knowledge graph in the medical domain (called entity II). RESULTS: We conduct experiments on the ClinicalSTS corpus of the 2019 n2c2/OHNLP challenge for model performance evaluation. The model only using BERT for text snippet pair encoding achieved a Pearson correlation coefficient (PCC) of 0.848. When character-level representation and entity-level representation are individually added into our model, the PCC increased to 0.857 and 0.854 (entity I)/0.859 (entity II), respectively. When both character-level representation and entity-level representation are added into our model, the PCC further increased to 0.861 (entity I) and 0.868 (entity II). CONCLUSIONS: Experimental results show that both character-level information and entity-level information can effectively enhance the BERT-based STS model. JMIR Publications 2020-12-29 /pmc/articles/PMC7803475/ /pubmed/33372664 http://dx.doi.org/10.2196/23357 Text en ©Ying Xiong, Shuai Chen, Qingcai Chen, Jun Yan, Buzhou Tang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 29.12.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Xiong, Ying Chen, Shuai Chen, Qingcai Yan, Jun Tang, Buzhou Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title	Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title_full	Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title_fullStr	Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title_full_unstemmed	Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title_short	Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study
title_sort	using character-level and entity-level representations to enhance bidirectional encoder representation from transformers-based clinical semantic textual similarity model: clinicalsts modeling study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7803475/ https://www.ncbi.nlm.nih.gov/pubmed/33372664 http://dx.doi.org/10.2196/23357
work_keys_str_mv	AT xiongying usingcharacterlevelandentitylevelrepresentationstoenhancebidirectionalencoderrepresentationfromtransformersbasedclinicalsemantictextualsimilaritymodelclinicalstsmodelingstudy AT chenshuai usingcharacterlevelandentitylevelrepresentationstoenhancebidirectionalencoderrepresentationfromtransformersbasedclinicalsemantictextualsimilaritymodelclinicalstsmodelingstudy AT chenqingcai usingcharacterlevelandentitylevelrepresentationstoenhancebidirectionalencoderrepresentationfromtransformersbasedclinicalsemantictextualsimilaritymodelclinicalstsmodelingstudy AT yanjun usingcharacterlevelandentitylevelrepresentationstoenhancebidirectionalencoderrepresentationfromtransformersbasedclinicalsemantictextualsimilaritymodelclinicalstsmodelingstudy AT tangbuzhou usingcharacterlevelandentitylevelrepresentationstoenhancebidirectionalencoderrepresentationfromtransformersbasedclinicalsemantictextualsimilaritymodelclinicalstsmodelingstudy

Using Character-Level and Entity-Level Representations to Enhance Bidirectional Encoder Representation From Transformers-Based Clinical Semantic Textual Similarity Model: ClinicalSTS Modeling Study

Ejemplares similares