Cargando…

Composition-driven symptom phrase recognition for Chinese medical consultation corpora

BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Xuan, Sun, Zhengya, Zhang, Wensheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8714445/
https://www.ncbi.nlm.nih.gov/pubmed/34961490
http://dx.doi.org/10.1186/s12911-021-01716-2
Descripción
Sumario:BACKGROUND: Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. METHODS: In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. RESULTS: Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. CONCLUSIONS: Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.