Cargando…

Simple tricks for improving pattern-based information extraction from the biomedical literature

BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Quang Long, Tikk, Domonkos, Leser, Ulf
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955645/
https://www.ncbi.nlm.nih.gov/pubmed/20868467
http://dx.doi.org/10.1186/2041-1480-1-9
_version_ 1782188058567770112
author Nguyen, Quang Long
Tikk, Domonkos
Leser, Ulf
author_facet Nguyen, Quang Long
Tikk, Domonkos
Leser, Ulf
author_sort Nguyen, Quang Long
collection PubMed
description BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns. RESULTS: We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%. CONCLUSIONS: Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction.
format Text
id pubmed-2955645
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29556452010-10-16 Simple tricks for improving pattern-based information extraction from the biomedical literature Nguyen, Quang Long Tikk, Domonkos Leser, Ulf J Biomed Semantics Research BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns. RESULTS: We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%. CONCLUSIONS: Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction. BioMed Central 2010-09-24 /pmc/articles/PMC2955645/ /pubmed/20868467 http://dx.doi.org/10.1186/2041-1480-1-9 Text en Copyright ©2010 Nguyen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Nguyen, Quang Long
Tikk, Domonkos
Leser, Ulf
Simple tricks for improving pattern-based information extraction from the biomedical literature
title Simple tricks for improving pattern-based information extraction from the biomedical literature
title_full Simple tricks for improving pattern-based information extraction from the biomedical literature
title_fullStr Simple tricks for improving pattern-based information extraction from the biomedical literature
title_full_unstemmed Simple tricks for improving pattern-based information extraction from the biomedical literature
title_short Simple tricks for improving pattern-based information extraction from the biomedical literature
title_sort simple tricks for improving pattern-based information extraction from the biomedical literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955645/
https://www.ncbi.nlm.nih.gov/pubmed/20868467
http://dx.doi.org/10.1186/2041-1480-1-9
work_keys_str_mv AT nguyenquanglong simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature
AT tikkdomonkos simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature
AT leserulf simpletricksforimprovingpatternbasedinformationextractionfromthebiomedicalliterature