Cargando…
TrigNER: automatically optimized biomedical event trigger recognition on scientific documents
BACKGROUND: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3896761/ https://www.ncbi.nlm.nih.gov/pubmed/24401704 http://dx.doi.org/10.1186/1751-0473-9-1 |
_version_ | 1782300128487407616 |
---|---|
author | Campos, David Bui, Quoc-Chinh Matos, Sérgio Oliveira, José Luís |
author_facet | Campos, David Bui, Quoc-Chinh Matos, Sérgio Oliveira, José Luís |
author_sort | Campos, David |
collection | PubMed |
description | BACKGROUND: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context. RESULTS: We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type. CONCLUSIONS: TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner. |
format | Online Article Text |
id | pubmed-3896761 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38967612014-01-31 TrigNER: automatically optimized biomedical event trigger recognition on scientific documents Campos, David Bui, Quoc-Chinh Matos, Sérgio Oliveira, José Luís Source Code Biol Med Software Review BACKGROUND: Cellular events play a central role in the understanding of biological processes and functions, providing insight on both physiological and pathogenesis mechanisms. Automatic extraction of mentions of such events from the literature represents an important contribution to the progress of the biomedical domain, allowing faster updating of existing knowledge. The identification of trigger words indicating an event is a very important step in the event extraction pipeline, since the following task(s) rely on its output. This step presents various complex and unsolved challenges, namely the selection of informative features, the representation of the textual context, and the selection of a specific event type for a trigger word given this context. RESULTS: We propose TrigNER, a machine learning-based solution for biomedical event trigger recognition, which takes advantage of Conditional Random Fields (CRFs) with a high-end feature set, including linguistic-based, orthographic, morphological, local context and dependency parsing features. Additionally, a completely configurable algorithm is used to automatically optimize the feature set and training parameters for each event type. Thus, it automatically selects the features that have a positive contribution and automatically optimizes the CRF model order, n-grams sizes, vertex information and maximum hops for dependency parsing features. The final output consists of various CRF models, each one optimized to the linguistic characteristics of each event type. CONCLUSIONS: TrigNER was tested in the BioNLP 2009 shared task corpus, achieving a total F-measure of 62.7 and outperforming existing solutions on various event trigger types, namely gene expression, transcription, protein catabolism, phosphorylation and binding. The proposed solution allows researchers to easily apply complex and optimized techniques in the recognition of biomedical event triggers, making its application a simple routine task. We believe this work is an important contribution to the biomedical text mining community, contributing to improved and faster event recognition on scientific articles, and consequent hypothesis generation and knowledge discovery. This solution is freely available as open source at http://bioinformatics.ua.pt/trigner. BioMed Central 2014-01-08 /pmc/articles/PMC3896761/ /pubmed/24401704 http://dx.doi.org/10.1186/1751-0473-9-1 Text en Copyright © 2014 Campos et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Review Campos, David Bui, Quoc-Chinh Matos, Sérgio Oliveira, José Luís TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title | TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title_full | TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title_fullStr | TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title_full_unstemmed | TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title_short | TrigNER: automatically optimized biomedical event trigger recognition on scientific documents |
title_sort | trigner: automatically optimized biomedical event trigger recognition on scientific documents |
topic | Software Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3896761/ https://www.ncbi.nlm.nih.gov/pubmed/24401704 http://dx.doi.org/10.1186/1751-0473-9-1 |
work_keys_str_mv | AT camposdavid trignerautomaticallyoptimizedbiomedicaleventtriggerrecognitiononscientificdocuments AT buiquocchinh trignerautomaticallyoptimizedbiomedicaleventtriggerrecognitiononscientificdocuments AT matossergio trignerautomaticallyoptimizedbiomedicaleventtriggerrecognitiononscientificdocuments AT oliveirajoseluis trignerautomaticallyoptimizedbiomedicaleventtriggerrecognitiononscientificdocuments |