Cargando…

A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations

Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Bollegala, Danushka, Kontonatsios, Georgios, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4452086/
https://www.ncbi.nlm.nih.gov/pubmed/26030738
http://dx.doi.org/10.1371/journal.pone.0126196
_version_ 1782374248544731136
author Bollegala, Danushka
Kontonatsios, Georgios
Ananiadou, Sophia
author_facet Bollegala, Danushka
Kontonatsios, Georgios
Ananiadou, Sophia
author_sort Bollegala, Danushka
collection PubMed
description Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.
format Online
Article
Text
id pubmed-4452086
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44520862015-06-09 A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations Bollegala, Danushka Kontonatsios, Georgios Ananiadou, Sophia PLoS One Research Article Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. Public Library of Science 2015-06-01 /pmc/articles/PMC4452086/ /pubmed/26030738 http://dx.doi.org/10.1371/journal.pone.0126196 Text en © 2015 Bollegala et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bollegala, Danushka
Kontonatsios, Georgios
Ananiadou, Sophia
A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title_full A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title_fullStr A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title_full_unstemmed A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title_short A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
title_sort cross-lingual similarity measure for detecting biomedical term translations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4452086/
https://www.ncbi.nlm.nih.gov/pubmed/26030738
http://dx.doi.org/10.1371/journal.pone.0126196
work_keys_str_mv AT bollegaladanushka acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT kontonatsiosgeorgios acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT ananiadousophia acrosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT bollegaladanushka crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT kontonatsiosgeorgios crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations
AT ananiadousophia crosslingualsimilaritymeasurefordetectingbiomedicaltermtranslations