Cargando…
Simple tricks for improving pattern-based information extraction from the biomedical literature
BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955645/ https://www.ncbi.nlm.nih.gov/pubmed/20868467 http://dx.doi.org/10.1186/2041-1480-1-9 |
_version_ | 1782188058567770112 |
---|---|
author | Nguyen, Quang Long Tikk, Domonkos Leser, Ulf |
author_facet | Nguyen, Quang Long Tikk, Domonkos Leser, Ulf |
author_sort | Nguyen, Quang Long |
collection | PubMed |
description | BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns. RESULTS: We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%. CONCLUSIONS: Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction. |
format | Text |
id | pubmed-2955645 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29556452010-10-16 Simple tricks for improving pattern-based information extraction from the biomedical literature Nguyen, Quang Long Tikk, Domonkos Leser, Ulf J Biomed Semantics Research BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns. RESULTS: We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%. CONCLUSIONS: Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction. BioMed Central 2010-09-24 /pmc/articles/PMC2955645/ /pubmed/20868467 http://dx.doi.org/10.1186/2041-1480-1-9 Text en Copyright ©2010 Nguyen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Nguyen, Quang Long Tikk, Domonkos Leser, Ulf Simple tricks for improving pattern-based information extraction from the biomedical literature |
title | Simple tricks for improving pattern-based information extraction from the biomedical literature |
title_full | Simple tricks for improving pattern-based information extraction from the biomedical literature |
title_fullStr | Simple tricks for improving pattern-based information extraction from the biomedical literature |
title_full_unstemmed | Simple tricks for improving pattern-based information extraction from the biomedical literature |
title_short | Simple tricks for improving pattern-based information extraction from the biomedical literature |
title_sort | simple tricks for improving pattern-based information extraction from the biomedical literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955645/ https://www.ncbi.nlm.nih.gov/pubmed/20868467 http://dx.doi.org/10.1186/2041-1480-1-9 |
work_keys_str_mv | AT nguyenquanglong simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature AT tikkdomonkos simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature AT leserulf simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature |