Cargando…
Composition-driven symptom phrase recognition for Chinese medical consultation corpora
BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8714445/ https://www.ncbi.nlm.nih.gov/pubmed/34961490 http://dx.doi.org/10.1186/s12911-021-01716-2 |
_version_ | 1784623915163713536 |
---|---|
author | Gu, Xuan Sun, Zhengya Zhang, Wensheng |
author_facet | Gu, Xuan Sun, Zhengya Zhang, Wensheng |
author_sort | Gu, Xuan |
collection | PubMed |
description | BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. METHODS: In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. RESULTS: Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. CONCLUSIONS: Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels. |
format | Online Article Text |
id | pubmed-8714445 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-87144452022-01-05 Composition-driven symptom phrase recognition for Chinese medical consultation corpora Gu, Xuan Sun, Zhengya Zhang, Wensheng BMC Med Inform Decis Mak Research BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. METHODS: In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. RESULTS: Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. CONCLUSIONS: Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels. BioMed Central 2021-12-27 /pmc/articles/PMC8714445/ /pubmed/34961490 http://dx.doi.org/10.1186/s12911-021-01716-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Gu, Xuan Sun, Zhengya Zhang, Wensheng Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_full | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_fullStr | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_full_unstemmed | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_short | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_sort | composition-driven symptom phrase recognition for chinese medical consultation corpora |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8714445/ https://www.ncbi.nlm.nih.gov/pubmed/34961490 http://dx.doi.org/10.1186/s12911-021-01716-2 |
work_keys_str_mv | AT guxuan compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora AT sunzhengya compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora AT zhangwensheng compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora |