Cargando…

ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study

BACKGROUND: In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents hav...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Junyi, Zhang, Xuejie, Zhou, Xiaobing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7864778/ https://www.ncbi.nlm.nih.gov/pubmed/33480858 http://dx.doi.org/10.2196/23086

_version_	1783647716301078528
author	Li, Junyi Zhang, Xuejie Zhou, Xiaobing
author_facet	Li, Junyi Zhang, Xuejie Zhou, Xiaobing
author_sort	Li, Junyi
collection	PubMed
description	BACKGROUND: In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents have become important data resources for clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved. OBJECTIVE: This research aims to solve 2 problems—(1) when the size of medical data sets is small, leading to insufficient learning with understanding of the models and (2) when information is lost in the process of long-distance propagation, causing the models to be unable to grasp key information. METHODS: This paper combines a text data augmentation method and a self-ensemble ALBERT model under semisupervised learning to perform clinical textual semantic similarity calculations. RESULTS: Compared with the methods in the 2019 National Natural Language Processing Clinical Challenges Open Health Natural Language Processing shared task Track on Clinical Semantic Textual Similarity, our method surpasses the best result by 2 percentage points and achieves a Pearson correlation coefficient of 0.92. CONCLUSIONS: When the size of medical data set is small, data augmentation can increase the size of the data set and improved semisupervised learning can boost the learning efficiency of the model. Additionally, self-ensemble methods improve the model performance. Our method had excellent performance and has great potential to improve related medical problems.
format	Online Article Text
id	pubmed-7864778
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-78647782021-02-10 ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study Li, Junyi Zhang, Xuejie Zhou, Xiaobing JMIR Med Inform Original Paper BACKGROUND: In recent years, with increases in the amount of information available and the importance of information screening, increased attention has been paid to the calculation of textual semantic similarity. In the field of medicine, electronic medical records and medical research documents have become important data resources for clinical research. Medical textual semantic similarity calculation has become an urgent problem to be solved. OBJECTIVE: This research aims to solve 2 problems—(1) when the size of medical data sets is small, leading to insufficient learning with understanding of the models and (2) when information is lost in the process of long-distance propagation, causing the models to be unable to grasp key information. METHODS: This paper combines a text data augmentation method and a self-ensemble ALBERT model under semisupervised learning to perform clinical textual semantic similarity calculations. RESULTS: Compared with the methods in the 2019 National Natural Language Processing Clinical Challenges Open Health Natural Language Processing shared task Track on Clinical Semantic Textual Similarity, our method surpasses the best result by 2 percentage points and achieves a Pearson correlation coefficient of 0.92. CONCLUSIONS: When the size of medical data set is small, data augmentation can increase the size of the data set and improved semisupervised learning can boost the learning efficiency of the model. Additionally, self-ensemble methods improve the model performance. Our method had excellent performance and has great potential to improve related medical problems. JMIR Publications 2021-01-22 /pmc/articles/PMC7864778/ /pubmed/33480858 http://dx.doi.org/10.2196/23086 Text en ©Junyi Li, Xuejie Zhang, Xiaobing Zhou. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 22.01.2021. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Li, Junyi Zhang, Xuejie Zhou, Xiaobing ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title	ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title_full	ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title_fullStr	ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title_full_unstemmed	ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title_short	ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study
title_sort	albert-based self-ensemble model with semisupervised learning and data augmentation for clinical semantic textual similarity calculation: algorithm validation study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7864778/ https://www.ncbi.nlm.nih.gov/pubmed/33480858 http://dx.doi.org/10.2196/23086
work_keys_str_mv	AT lijunyi albertbasedselfensemblemodelwithsemisupervisedlearninganddataaugmentationforclinicalsemantictextualsimilaritycalculationalgorithmvalidationstudy AT zhangxuejie albertbasedselfensemblemodelwithsemisupervisedlearninganddataaugmentationforclinicalsemantictextualsimilaritycalculationalgorithmvalidationstudy AT zhouxiaobing albertbasedselfensemblemodelwithsemisupervisedlearninganddataaugmentationforclinicalsemantictextualsimilaritycalculationalgorithmvalidationstudy

ALBERT-Based Self-Ensemble Model With Semisupervised Learning and Data Augmentation for Clinical Semantic Textual Similarity Calculation: Algorithm Validation Study

Ejemplares similares