Cargando…
A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually tr...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4452086/ https://www.ncbi.nlm.nih.gov/pubmed/26030738 http://dx.doi.org/10.1371/journal.pone.0126196 |
_version_ | 1782374248544731136 |
---|---|
author | Bollegala, Danushka Kontonatsios, Georgios Ananiadou, Sophia |
author_facet | Bollegala, Danushka Kontonatsios, Georgios Ananiadou, Sophia |
author_sort | Bollegala, Danushka |
collection | PubMed |
description | Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. |
format | Online Article Text |
id | pubmed-4452086 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44520862015-06-09 A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations Bollegala, Danushka Kontonatsios, Georgios Ananiadou, Sophia PLoS One Research Article Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. Public Library of Science 2015-06-01 /pmc/articles/PMC4452086/ /pubmed/26030738 http://dx.doi.org/10.1371/journal.pone.0126196 Text en © 2015 Bollegala et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bollegala, Danushka Kontonatsios, Georgios Ananiadou, Sophia A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title | A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title_full | A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title_fullStr | A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title_full_unstemmed | A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title_short | A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations |
title_sort | cross-lingual similarity measure for detecting biomedical term translations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4452086/ https://www.ncbi.nlm.nih.gov/pubmed/26030738 http://dx.doi.org/10.1371/journal.pone.0126196 |
work_keys_str_mv | AT bollegaladanushka acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations AT kontonatsiosgeorgios acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations AT ananiadousophia acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations AT bollegaladanushka crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations AT kontonatsiosgeorgios crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations AT ananiadousophia crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations |