Cargando…

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview

BACKGROUND: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical do...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Yanshan, Fu, Sunyang, Shen, Feichen, Henry, Sam, Uzuner, Ozlem, Liu, Hongfang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7732706/ https://www.ncbi.nlm.nih.gov/pubmed/33245291 http://dx.doi.org/10.2196/23375

_version_	1783622153510322176
author	Wang, Yanshan Fu, Sunyang Shen, Feichen Henry, Sam Uzuner, Ozlem Liu, Hongfang
author_facet	Wang, Yanshan Fu, Sunyang Shen, Feichen Henry, Sam Uzuner, Ozlem Liu, Hongfang
author_sort	Wang, Yanshan
collection	PubMed
description	BACKGROUND: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE: Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS: We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS: Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS: The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text.
format	Online Article Text
id	pubmed-7732706
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-77327062020-12-22 The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview Wang, Yanshan Fu, Sunyang Shen, Feichen Henry, Sam Uzuner, Ozlem Liu, Hongfang JMIR Med Inform Original Paper BACKGROUND: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval. OBJECTIVE: Our objective was to release ClinicalSTS data sets and to motivate natural language processing and biomedical informatics communities to tackle semantic text similarity tasks in the clinical domain. METHODS: We organized the first BioCreative/OHNLP ClinicalSTS shared task in 2018 by making available a real-world ClinicalSTS data set. We continued the shared task in 2019 in collaboration with National NLP Clinical Challenges (n2c2) and the Open Health Natural Language Processing (OHNLP) consortium and organized the 2019 n2c2/OHNLP ClinicalSTS track. We released a larger ClinicalSTS data set comprising 1642 clinical sentence pairs, including 1068 pairs from the 2018 shared task and 1006 new pairs from 2 electronic health record systems, GE and Epic. We released 80% (1642/2054) of the data to participating teams to develop and fine-tune the semantic textual similarity systems and used the remaining 20% (412/2054) as blind testing to evaluate their systems. The workshop was held in conjunction with the American Medical Informatics Association 2019 Annual Symposium. RESULTS: Of the 78 international teams that signed on to the n2c2/OHNLP ClinicalSTS shared task, 33 produced a total of 87 valid system submissions. The top 3 systems were generated by IBM Research, the National Center for Biotechnology Information, and the University of Florida, with Pearson correlations of r=.9010, r=.8967, and r=.8864, respectively. Most top-performing systems used state-of-the-art neural language models, such as BERT and XLNet, and state-of-the-art training schemas in deep learning, such as pretraining and fine-tuning schema, and multitask learning. Overall, the participating systems performed better on the Epic sentence pairs than on the GE sentence pairs, despite a much larger portion of the training data being GE sentence pairs. CONCLUSIONS: The 2019 n2c2/OHNLP ClinicalSTS shared task focused on computing semantic similarity for clinical text sentences generated from clinical notes in the real world. It attracted a large number of international teams. The ClinicalSTS shared task could continue to serve as a venue for researchers in natural language processing and medical informatics communities to develop and improve semantic textual similarity techniques for clinical text. JMIR Publications 2020-11-27 /pmc/articles/PMC7732706/ /pubmed/33245291 http://dx.doi.org/10.2196/23375 Text en ©Yanshan Wang, Sunyang Fu, Feichen Shen, Sam Henry, Ozlem Uzuner, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 27.11.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Wang, Yanshan Fu, Sunyang Shen, Feichen Henry, Sam Uzuner, Ozlem Liu, Hongfang The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title	The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title_full	The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title_fullStr	The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title_full_unstemmed	The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title_short	The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
title_sort	2019 n2c2/ohnlp track on clinical semantic textual similarity: overview
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7732706/ https://www.ncbi.nlm.nih.gov/pubmed/33245291 http://dx.doi.org/10.2196/23375
work_keys_str_mv	AT wangyanshan the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT fusunyang the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT shenfeichen the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT henrysam the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT uzunerozlem the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT liuhongfang the2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT wangyanshan 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT fusunyang 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT shenfeichen 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT henrysam 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT uzunerozlem 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview AT liuhongfang 2019n2c2ohnlptrackonclinicalsemantictextualsimilarityoverview

The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview

Ejemplares similares