Cargando…

CapsTM: capsule network for Chinese medical text matching

BACKGROUND: Text Matching (TM) is a fundamental task of natural language processing widely used in many application systems such as information retrieval, automatic question answering, machine translation, dialogue system, reading comprehension, etc. In recent years, a large number of deep learning...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Xiaoming, Shen, Yedan, Ni, Yuan, Huang, Xiaowei, Wang, Xiaolong, Chen, Qingcai, Tang, Buzhou
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8322831/ https://www.ncbi.nlm.nih.gov/pubmed/34330253 http://dx.doi.org/10.1186/s12911-021-01442-9

_version_	1783731139142221824
author	Yu, Xiaoming Shen, Yedan Ni, Yuan Huang, Xiaowei Wang, Xiaolong Chen, Qingcai Tang, Buzhou
author_facet	Yu, Xiaoming Shen, Yedan Ni, Yuan Huang, Xiaowei Wang, Xiaolong Chen, Qingcai Tang, Buzhou
author_sort	Yu, Xiaoming
collection	PubMed
description	BACKGROUND: Text Matching (TM) is a fundamental task of natural language processing widely used in many application systems such as information retrieval, automatic question answering, machine translation, dialogue system, reading comprehension, etc. In recent years, a large number of deep learning neural networks have been applied to TM, and have refreshed benchmarks of TM repeatedly. Among the deep learning neural networks, convolutional neural network (CNN) is one of the most popular networks, which suffers from difficulties in dealing with small samples and keeping relative structures of features. In this paper, we propose a novel deep learning architecture based on capsule network for TM, called CapsTM, where capsule network is a new type of neural network architecture proposed to address some of the short comings of CNN and shows great potential in many tasks. METHODS: CapsTM is a five-layer neural network, including an input layer, a representation layer, an aggregation layer, a capsule layer and a prediction layer. In CapsTM, two pieces of text are first individually converted into sequences of embeddings and are further transformed by a highway network in the input layer. Then, Bidirectional Long Short-Term Memory (BiLSTM) is used to represent each piece of text and attention-based interaction matrix is used to represent interactive information of the two pieces of text in the representation layer. Subsequently, the two kinds of representations are fused together by BiLSTM in the aggregation layer, and are further represented with capsules (vectors) in the capsule layer. Finally, the prediction layer is a connected network used for classification. CapsTM is an extension of ESIM by adding a capsule layer before the prediction layer. RESULTS: We construct a corpus of Chinese medical question matching, which contains 36,360 question pairs. This corpus is randomly split into three parts: a training set of 32,360 question pairs, a development set of 2000 question pairs and a test set of 2000 question pairs. On this corpus, we conduct a series of experiments to evaluate the proposed CapsTM and compare it with other state-of-the-art methods. CapsTM achieves the highest F-score of 0.8666. CONCLUSION: The experimental results demonstrate that CapsTM is effective for Chinese medical question matching and outperforms other state-of-the-art methods for comparison.
format	Online Article Text
id	pubmed-8322831
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-83228312021-07-30 CapsTM: capsule network for Chinese medical text matching Yu, Xiaoming Shen, Yedan Ni, Yuan Huang, Xiaowei Wang, Xiaolong Chen, Qingcai Tang, Buzhou BMC Med Inform Decis Mak Research BACKGROUND: Text Matching (TM) is a fundamental task of natural language processing widely used in many application systems such as information retrieval, automatic question answering, machine translation, dialogue system, reading comprehension, etc. In recent years, a large number of deep learning neural networks have been applied to TM, and have refreshed benchmarks of TM repeatedly. Among the deep learning neural networks, convolutional neural network (CNN) is one of the most popular networks, which suffers from difficulties in dealing with small samples and keeping relative structures of features. In this paper, we propose a novel deep learning architecture based on capsule network for TM, called CapsTM, where capsule network is a new type of neural network architecture proposed to address some of the short comings of CNN and shows great potential in many tasks. METHODS: CapsTM is a five-layer neural network, including an input layer, a representation layer, an aggregation layer, a capsule layer and a prediction layer. In CapsTM, two pieces of text are first individually converted into sequences of embeddings and are further transformed by a highway network in the input layer. Then, Bidirectional Long Short-Term Memory (BiLSTM) is used to represent each piece of text and attention-based interaction matrix is used to represent interactive information of the two pieces of text in the representation layer. Subsequently, the two kinds of representations are fused together by BiLSTM in the aggregation layer, and are further represented with capsules (vectors) in the capsule layer. Finally, the prediction layer is a connected network used for classification. CapsTM is an extension of ESIM by adding a capsule layer before the prediction layer. RESULTS: We construct a corpus of Chinese medical question matching, which contains 36,360 question pairs. This corpus is randomly split into three parts: a training set of 32,360 question pairs, a development set of 2000 question pairs and a test set of 2000 question pairs. On this corpus, we conduct a series of experiments to evaluate the proposed CapsTM and compare it with other state-of-the-art methods. CapsTM achieves the highest F-score of 0.8666. CONCLUSION: The experimental results demonstrate that CapsTM is effective for Chinese medical question matching and outperforms other state-of-the-art methods for comparison. BioMed Central 2021-07-30 /pmc/articles/PMC8322831/ /pubmed/34330253 http://dx.doi.org/10.1186/s12911-021-01442-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Yu, Xiaoming Shen, Yedan Ni, Yuan Huang, Xiaowei Wang, Xiaolong Chen, Qingcai Tang, Buzhou CapsTM: capsule network for Chinese medical text matching
title	CapsTM: capsule network for Chinese medical text matching
title_full	CapsTM: capsule network for Chinese medical text matching
title_fullStr	CapsTM: capsule network for Chinese medical text matching
title_full_unstemmed	CapsTM: capsule network for Chinese medical text matching
title_short	CapsTM: capsule network for Chinese medical text matching
title_sort	capstm: capsule network for chinese medical text matching
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8322831/ https://www.ncbi.nlm.nih.gov/pubmed/34330253 http://dx.doi.org/10.1186/s12911-021-01442-9
work_keys_str_mv	AT yuxiaoming capstmcapsulenetworkforchinesemedicaltextmatching AT shenyedan capstmcapsulenetworkforchinesemedicaltextmatching AT niyuan capstmcapsulenetworkforchinesemedicaltextmatching AT huangxiaowei capstmcapsulenetworkforchinesemedicaltextmatching AT wangxiaolong capstmcapsulenetworkforchinesemedicaltextmatching AT chenqingcai capstmcapsulenetworkforchinesemedicaltextmatching AT tangbuzhou capstmcapsulenetworkforchinesemedicaltextmatching

CapsTM: capsule network for Chinese medical text matching

Ejemplares similares