Cargando…

FlexiTerm: a flexible term recognition method

BACKGROUND: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be sui...

Descripción completa

Detalles Bibliográficos
Autores principales: Spasić, Irena, Greenwood, Mark, Preece, Alun, Francis, Nick, Elwyn, Glyn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3853334/
https://www.ncbi.nlm.nih.gov/pubmed/24112363
http://dx.doi.org/10.1186/2041-1480-4-27
_version_ 1782478812160720896
author Spasić, Irena
Greenwood, Mark
Preece, Alun
Francis, Nick
Elwyn, Glyn
author_facet Spasić, Irena
Greenwood, Mark
Preece, Alun
Francis, Nick
Elwyn, Glyn
author_sort Spasić, Irena
collection PubMed
description BACKGROUND: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. RESULTS: In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56%), recall (71.31%) and F-measure (81.31%) were achieved on a corpus of clinical notes. CONCLUSIONS: FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm.
format Online
Article
Text
id pubmed-3853334
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38533342013-12-18 FlexiTerm: a flexible term recognition method Spasić, Irena Greenwood, Mark Preece, Alun Francis, Nick Elwyn, Glyn J Biomed Semantics Research BACKGROUND: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. RESULTS: In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56%), recall (71.31%) and F-measure (81.31%) were achieved on a corpus of clinical notes. CONCLUSIONS: FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm. BioMed Central 2013-10-10 /pmc/articles/PMC3853334/ /pubmed/24112363 http://dx.doi.org/10.1186/2041-1480-4-27 Text en Copyright © 2013 Spasić et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Spasić, Irena
Greenwood, Mark
Preece, Alun
Francis, Nick
Elwyn, Glyn
FlexiTerm: a flexible term recognition method
title FlexiTerm: a flexible term recognition method
title_full FlexiTerm: a flexible term recognition method
title_fullStr FlexiTerm: a flexible term recognition method
title_full_unstemmed FlexiTerm: a flexible term recognition method
title_short FlexiTerm: a flexible term recognition method
title_sort flexiterm: a flexible term recognition method
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3853334/
https://www.ncbi.nlm.nih.gov/pubmed/24112363
http://dx.doi.org/10.1186/2041-1480-4-27
work_keys_str_mv AT spasicirena flexitermaflexibletermrecognitionmethod
AT greenwoodmark flexitermaflexibletermrecognitionmethod
AT preecealun flexitermaflexibletermrecognitionmethod
AT francisnick flexitermaflexibletermrecognitionmethod
AT elwynglyn flexitermaflexibletermrecognitionmethod