Cargando…
Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national trea...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424557/ https://www.ncbi.nlm.nih.gov/pubmed/25956056 http://dx.doi.org/10.1186/s12859-015-0606-0 |
_version_ | 1782370349626687488 |
---|---|
author | Xu, Yan Chen, Luoxin Wei, Junsheng Ananiadou, Sophia Fan, Yubo Qian, Yi Chang, Eric I-Chao Tsujii, Junichi |
author_facet | Xu, Yan Chen, Luoxin Wei, Junsheng Ananiadou, Sophia Fan, Yubo Qian, Yi Chang, Eric I-Chao Tsujii, Junichi |
author_sort | Xu, Yan |
collection | PubMed |
description | BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates. RESULTS: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20. CONCLUSIONS: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms. |
format | Online Article Text |
id | pubmed-4424557 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44245572015-05-09 Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary Xu, Yan Chen, Luoxin Wei, Junsheng Ananiadou, Sophia Fan, Yubo Qian, Yi Chang, Eric I-Chao Tsujii, Junichi BMC Bioinformatics Research Article BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates. RESULTS: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20. CONCLUSIONS: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms. BioMed Central 2015-05-09 /pmc/articles/PMC4424557/ /pubmed/25956056 http://dx.doi.org/10.1186/s12859-015-0606-0 Text en © Xu et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Xu, Yan Chen, Luoxin Wei, Junsheng Ananiadou, Sophia Fan, Yubo Qian, Yi Chang, Eric I-Chao Tsujii, Junichi Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title | Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title_full | Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title_fullStr | Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title_full_unstemmed | Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title_short | Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary |
title_sort | bilingual term alignment from comparable corpora in english discharge summary and chinese discharge summary |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424557/ https://www.ncbi.nlm.nih.gov/pubmed/25956056 http://dx.doi.org/10.1186/s12859-015-0606-0 |
work_keys_str_mv | AT xuyan bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT chenluoxin bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT weijunsheng bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT ananiadousophia bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT fanyubo bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT qianyi bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT changericichao bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary AT tsujiijunichi bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary |