Cargando…
Building a high-quality sense inventory for improved abbreviation disambiguation
Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/ https://www.ncbi.nlm.nih.gov/pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129 |
_version_ | 1782180483915841536 |
---|---|
author | Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi |
author_facet | Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi |
author_sort | Okazaki, Naoaki |
collection | PubMed |
description | Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org |
format | Text |
id | pubmed-2859134 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28591342010-04-26 Building a high-quality sense inventory for improved abbreviation disambiguation Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi Bioinformatics Original Papers Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org Oxford University Press 2010-05-01 2010-03-30 /pmc/articles/PMC2859134/ /pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi Building a high-quality sense inventory for improved abbreviation disambiguation |
title | Building a high-quality sense inventory for improved abbreviation disambiguation |
title_full | Building a high-quality sense inventory for improved abbreviation disambiguation |
title_fullStr | Building a high-quality sense inventory for improved abbreviation disambiguation |
title_full_unstemmed | Building a high-quality sense inventory for improved abbreviation disambiguation |
title_short | Building a high-quality sense inventory for improved abbreviation disambiguation |
title_sort | building a high-quality sense inventory for improved abbreviation disambiguation |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/ https://www.ncbi.nlm.nih.gov/pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129 |
work_keys_str_mv | AT okazakinaoaki buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation AT ananiadousophia buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation AT tsujiijunichi buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation |