Cargando…

Building a high-quality sense inventory for improved abbreviation disambiguation

Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions...

Descripción completa

Detalles Bibliográficos
Autores principales: Okazaki, Naoaki, Ananiadou, Sophia, Tsujii, Jun'ichi
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/
https://www.ncbi.nlm.nih.gov/pubmed/20360059
http://dx.doi.org/10.1093/bioinformatics/btq129
_version_ 1782180483915841536
author Okazaki, Naoaki
Ananiadou, Sophia
Tsujii, Jun'ichi
author_facet Okazaki, Naoaki
Ananiadou, Sophia
Tsujii, Jun'ichi
author_sort Okazaki, Naoaki
collection PubMed
description Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org
format Text
id pubmed-2859134
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28591342010-04-26 Building a high-quality sense inventory for improved abbreviation disambiguation Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi Bioinformatics Original Papers Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org Oxford University Press 2010-05-01 2010-03-30 /pmc/articles/PMC2859134/ /pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Okazaki, Naoaki
Ananiadou, Sophia
Tsujii, Jun'ichi
Building a high-quality sense inventory for improved abbreviation disambiguation
title Building a high-quality sense inventory for improved abbreviation disambiguation
title_full Building a high-quality sense inventory for improved abbreviation disambiguation
title_fullStr Building a high-quality sense inventory for improved abbreviation disambiguation
title_full_unstemmed Building a high-quality sense inventory for improved abbreviation disambiguation
title_short Building a high-quality sense inventory for improved abbreviation disambiguation
title_sort building a high-quality sense inventory for improved abbreviation disambiguation
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/
https://www.ncbi.nlm.nih.gov/pubmed/20360059
http://dx.doi.org/10.1093/bioinformatics/btq129
work_keys_str_mv AT okazakinaoaki buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation
AT ananiadousophia buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation
AT tsujiijunichi buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation