Cargando…

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches

BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of m...

Descripción completa

Detalles Bibliográficos
Autores principales: Pyysalo, Sampo, Salakoski, Tapio, Aubin, Sophie, Nazarenko, Adeline
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764446/
https://www.ncbi.nlm.nih.gov/pubmed/17134475
http://dx.doi.org/10.1186/1471-2105-7-S3-S2
_version_ 1782131615288262656
author Pyysalo, Sampo
Salakoski, Tapio
Aubin, Sophie
Nazarenko, Adeline
author_facet Pyysalo, Sampo
Salakoski, Tapio
Aubin, Sophie
Nazarenko, Adeline
author_sort Pyysalo, Sampo
collection PubMed
description BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. RESULTS: In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. CONCLUSION: When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.
format Text
id pubmed-1764446
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17644462007-01-09 Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches Pyysalo, Sampo Salakoski, Tapio Aubin, Sophie Nazarenko, Adeline BMC Bioinformatics Proceedings BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. RESULTS: In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. CONCLUSION: When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license. BioMed Central 2006-11-24 /pmc/articles/PMC1764446/ /pubmed/17134475 http://dx.doi.org/10.1186/1471-2105-7-S3-S2 Text en Copyright © 2006 Pyysalo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Pyysalo, Sampo
Salakoski, Tapio
Aubin, Sophie
Nazarenko, Adeline
Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title_full Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title_fullStr Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title_full_unstemmed Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title_short Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
title_sort lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764446/
https://www.ncbi.nlm.nih.gov/pubmed/17134475
http://dx.doi.org/10.1186/1471-2105-7-S3-S2
work_keys_str_mv AT pyysalosampo lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches
AT salakoskitapio lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches
AT aubinsophie lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches
AT nazarenkoadeline lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches