Cargando…

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection...

Descripción completa

Detalles Bibliográficos
Autores principales: Soares, Felipe, Tateisi, Yuka, Takatsuki, Terue, Yamaguchi, Atsuko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510863/
https://www.ncbi.nlm.nih.gov/pubmed/34638173
http://dx.doi.org/10.5808/gi.21014
_version_ 1784582662937116672
author Soares, Felipe
Tateisi, Yuka
Takatsuki, Terue
Yamaguchi, Atsuko
author_facet Soares, Felipe
Tateisi, Yuka
Takatsuki, Terue
Yamaguchi, Atsuko
author_sort Soares, Felipe
collection PubMed
description Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.
format Online
Article
Text
id pubmed-8510863
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-85108632021-10-22 O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information Soares, Felipe Tateisi, Yuka Takatsuki, Terue Yamaguchi, Atsuko Genomics Inform Blah7 Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals. Korea Genome Organization 2021-09-30 /pmc/articles/PMC8510863/ /pubmed/34638173 http://dx.doi.org/10.5808/gi.21014 Text en (c) 2021, Korea Genome Organization https://creativecommons.org/licenses/by/4.0/(CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Blah7
Soares, Felipe
Tateisi, Yuka
Takatsuki, Terue
Yamaguchi, Atsuko
O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title_full O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title_fullStr O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title_full_unstemmed O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title_short O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information
title_sort o-jmesh: creating a bilingual english-japanese controlled vocabulary of mesh uids through machine translation and mutual information
topic Blah7
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510863/
https://www.ncbi.nlm.nih.gov/pubmed/34638173
http://dx.doi.org/10.5808/gi.21014
work_keys_str_mv AT soaresfelipe ojmeshcreatingabilingualenglishjapanesecontrolledvocabularyofmeshuidsthroughmachinetranslationandmutualinformation
AT tateisiyuka ojmeshcreatingabilingualenglishjapanesecontrolledvocabularyofmeshuidsthroughmachinetranslationandmutualinformation
AT takatsukiterue ojmeshcreatingabilingualenglishjapanesecontrolledvocabularyofmeshuidsthroughmachinetranslationandmutualinformation
AT yamaguchiatsuko ojmeshcreatingabilingualenglishjapanesecontrolledvocabularyofmeshuidsthroughmachinetranslationandmutualinformation