Cargando…
Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of m...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764446/ https://www.ncbi.nlm.nih.gov/pubmed/17134475 http://dx.doi.org/10.1186/1471-2105-7-S3-S2 |
_version_ | 1782131615288262656 |
---|---|
author | Pyysalo, Sampo Salakoski, Tapio Aubin, Sophie Nazarenko, Adeline |
author_facet | Pyysalo, Sampo Salakoski, Tapio Aubin, Sophie Nazarenko, Adeline |
author_sort | Pyysalo, Sampo |
collection | PubMed |
description | BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. RESULTS: In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. CONCLUSION: When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license. |
format | Text |
id | pubmed-1764446 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-17644462007-01-09 Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches Pyysalo, Sampo Salakoski, Tapio Aubin, Sophie Nazarenko, Adeline BMC Bioinformatics Proceedings BACKGROUND: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. RESULTS: In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. CONCLUSION: When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license. BioMed Central 2006-11-24 /pmc/articles/PMC1764446/ /pubmed/17134475 http://dx.doi.org/10.1186/1471-2105-7-S3-S2 Text en Copyright © 2006 Pyysalo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Pyysalo, Sampo Salakoski, Tapio Aubin, Sophie Nazarenko, Adeline Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title | Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title_full | Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title_fullStr | Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title_full_unstemmed | Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title_short | Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
title_sort | lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764446/ https://www.ncbi.nlm.nih.gov/pubmed/17134475 http://dx.doi.org/10.1186/1471-2105-7-S3-S2 |
work_keys_str_mv | AT pyysalosampo lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches AT salakoskitapio lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches AT aubinsophie lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches AT nazarenkoadeline lexicaladaptationoflinkgrammartothebiomedicalsublanguageacomparativeevaluationofthreeapproaches |