Cargando…

Creating a medical dictionary using word alignment: The influence of sources and resources

BACKGROUND: Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automati...

Descripción completa

Detalles Bibliográficos
Autores principales: Nyström, Mikael, Merkel, Magnus, Petersson, Håkan, Åhlfeldt, Hans
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267171/
https://www.ncbi.nlm.nih.gov/pubmed/18036221
http://dx.doi.org/10.1186/1472-6947-7-37
_version_ 1782151616323911680
author Nyström, Mikael
Merkel, Magnus
Petersson, Håkan
Åhlfeldt, Hans
author_facet Nyström, Mikael
Merkel, Magnus
Petersson, Håkan
Åhlfeldt, Hans
author_sort Nyström, Mikael
collection PubMed
description BACKGROUND: Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. METHODS: We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. RESULTS: The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. CONCLUSION: More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.
format Text
id pubmed-2267171
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22671712008-03-13 Creating a medical dictionary using word alignment: The influence of sources and resources Nyström, Mikael Merkel, Magnus Petersson, Håkan Åhlfeldt, Hans BMC Med Inform Decis Mak Research Article BACKGROUND: Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. METHODS: We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. RESULTS: The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. CONCLUSION: More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10. BioMed Central 2007-11-23 /pmc/articles/PMC2267171/ /pubmed/18036221 http://dx.doi.org/10.1186/1472-6947-7-37 Text en Copyright © 2007 Nyström et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nyström, Mikael
Merkel, Magnus
Petersson, Håkan
Åhlfeldt, Hans
Creating a medical dictionary using word alignment: The influence of sources and resources
title Creating a medical dictionary using word alignment: The influence of sources and resources
title_full Creating a medical dictionary using word alignment: The influence of sources and resources
title_fullStr Creating a medical dictionary using word alignment: The influence of sources and resources
title_full_unstemmed Creating a medical dictionary using word alignment: The influence of sources and resources
title_short Creating a medical dictionary using word alignment: The influence of sources and resources
title_sort creating a medical dictionary using word alignment: the influence of sources and resources
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267171/
https://www.ncbi.nlm.nih.gov/pubmed/18036221
http://dx.doi.org/10.1186/1472-6947-7-37
work_keys_str_mv AT nystrommikael creatingamedicaldictionaryusingwordalignmenttheinfluenceofsourcesandresources
AT merkelmagnus creatingamedicaldictionaryusingwordalignmenttheinfluenceofsourcesandresources
AT peterssonhakan creatingamedicaldictionaryusingwordalignmenttheinfluenceofsourcesandresources
AT ahlfeldthans creatingamedicaldictionaryusingwordalignmenttheinfluenceofsourcesandresources