Cargando…

Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary

BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national trea...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yan, Chen, Luoxin, Wei, Junsheng, Ananiadou, Sophia, Fan, Yubo, Qian, Yi, Chang, Eric I-Chao, Tsujii, Junichi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424557/
https://www.ncbi.nlm.nih.gov/pubmed/25956056
http://dx.doi.org/10.1186/s12859-015-0606-0
_version_ 1782370349626687488
author Xu, Yan
Chen, Luoxin
Wei, Junsheng
Ananiadou, Sophia
Fan, Yubo
Qian, Yi
Chang, Eric I-Chao
Tsujii, Junichi
author_facet Xu, Yan
Chen, Luoxin
Wei, Junsheng
Ananiadou, Sophia
Fan, Yubo
Qian, Yi
Chang, Eric I-Chao
Tsujii, Junichi
author_sort Xu, Yan
collection PubMed
description BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates. RESULTS: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20. CONCLUSIONS: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms.
format Online
Article
Text
id pubmed-4424557
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44245572015-05-09 Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary Xu, Yan Chen, Luoxin Wei, Junsheng Ananiadou, Sophia Fan, Yubo Qian, Yi Chang, Eric I-Chao Tsujii, Junichi BMC Bioinformatics Research Article BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates. RESULTS: We compare our LP method with a baseline method based on simple context-similarity. The LP based method outperforms the baseline with the accuracies: 4.44% Acc1, 24.44% Acc10, and 62.22% Acc100, where AccN means the top N accuracy. The accuracy of the LP method drops to 5.41% Acc10 and 8.11% Acc20 for MWTs. Our experiments show that the method based on term alignment improves the performance for MWTs to 16.22% Acc10 and 27.03% Acc20. CONCLUSIONS: We constructed a framework for building an English-Chinese term dictionary from discharge summaries in the two languages. Our experiments have shown that the LP-based method augmented with the term alignment method will contribute to reduction of manual work required to compile a bilingual sydictionary of clinical terms. BioMed Central 2015-05-09 /pmc/articles/PMC4424557/ /pubmed/25956056 http://dx.doi.org/10.1186/s12859-015-0606-0 Text en © Xu et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Xu, Yan
Chen, Luoxin
Wei, Junsheng
Ananiadou, Sophia
Fan, Yubo
Qian, Yi
Chang, Eric I-Chao
Tsujii, Junichi
Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title_full Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title_fullStr Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title_full_unstemmed Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title_short Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
title_sort bilingual term alignment from comparable corpora in english discharge summary and chinese discharge summary
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4424557/
https://www.ncbi.nlm.nih.gov/pubmed/25956056
http://dx.doi.org/10.1186/s12859-015-0606-0
work_keys_str_mv AT xuyan bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT chenluoxin bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT weijunsheng bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT ananiadousophia bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT fanyubo bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT qianyi bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT changericichao bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary
AT tsujiijunichi bilingualtermalignmentfromcomparablecorporainenglishdischargesummaryandchinesedischargesummary