Cargando…

Building a high-quality sense inventory for improved abbreviation disambiguation

Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions...

Descripción completa

Detalles Bibliográficos
Autores principales:	Okazaki, Naoaki, Ananiadou, Sophia, Tsujii, Jun'ichi
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2010
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/ https://www.ncbi.nlm.nih.gov/pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129

_version_	1782180483915841536
author	Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi
author_facet	Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi
author_sort	Okazaki, Naoaki
collection	PubMed
description	Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org
format	Text
id	pubmed-2859134
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-28591342010-04-26 Building a high-quality sense inventory for improved abbreviation disambiguation Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi Bioinformatics Original Papers Motivation: The ultimate goal of abbreviation management is to disambiguate every occurrence of an abbreviation into its expanded form (concept or sense). To collect expanded forms for abbreviations, previous studies have recognized abbreviations and their expanded forms in parenthetical expressions of bio-medical texts. However, expanded forms extracted by abbreviation recognition are mixtures of concepts/senses and their term variations. Consequently, a list of expanded forms should be structured into a sense inventory, which provides possible concepts or senses for abbreviation disambiguation. Results: A sense inventory is a key to robust management of abbreviations. Therefore, we present a supervised approach for clustering expanded forms. The experimental result reports 0.915 F1 score in clustering expanded forms. We then investigate the possibility of conflicts of protein and gene names with abbreviations. Finally, an experiment of abbreviation disambiguation on the sense inventory yielded 0.984 accuracy and 0.986 F1 score using the dataset obtained from MEDLINE abstracts. Availability: The sense inventory and disambiguator of abbreviations are accessible at http://www.nactem.ac.uk/software/acromine/ and http://www.nactem.ac.uk/software/acromine_disambiguation/ Contact: okazaki@chokkan.org Oxford University Press 2010-05-01 2010-03-30 /pmc/articles/PMC2859134/ /pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Okazaki, Naoaki Ananiadou, Sophia Tsujii, Jun'ichi Building a high-quality sense inventory for improved abbreviation disambiguation
title	Building a high-quality sense inventory for improved abbreviation disambiguation
title_full	Building a high-quality sense inventory for improved abbreviation disambiguation
title_fullStr	Building a high-quality sense inventory for improved abbreviation disambiguation
title_full_unstemmed	Building a high-quality sense inventory for improved abbreviation disambiguation
title_short	Building a high-quality sense inventory for improved abbreviation disambiguation
title_sort	building a high-quality sense inventory for improved abbreviation disambiguation
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859134/ https://www.ncbi.nlm.nih.gov/pubmed/20360059 http://dx.doi.org/10.1093/bioinformatics/btq129
work_keys_str_mv	AT okazakinaoaki buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation AT ananiadousophia buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation AT tsujiijunichi buildingahighqualitysenseinventoryforimprovedabbreviationdisambiguation

Building a high-quality sense inventory for improved abbreviation disambiguation

Ejemplares similares