Cargando…
TEES 2.2: Biomedical Event Extraction for Diverse Corpora
BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642046/ https://www.ncbi.nlm.nih.gov/pubmed/26551925 http://dx.doi.org/10.1186/1471-2105-16-S16-S4 |
_version_ | 1782400293722390528 |
---|---|
author | Björne, Jari Salakoski, Tapio |
author_facet | Björne, Jari Salakoski, Tapio |
author_sort | Björne, Jari |
collection | PubMed |
description | BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. |
format | Online Article Text |
id | pubmed-4642046 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46420462015-11-19 TEES 2.2: Biomedical Event Extraction for Diverse Corpora Björne, Jari Salakoski, Tapio BMC Bioinformatics Research BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. BioMed Central 2015-10-30 /pmc/articles/PMC4642046/ /pubmed/26551925 http://dx.doi.org/10.1186/1471-2105-16-S16-S4 Text en Copyright © 2015 Björne and Salakoski http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Björne, Jari Salakoski, Tapio TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title | TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title_full | TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title_fullStr | TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title_full_unstemmed | TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title_short | TEES 2.2: Biomedical Event Extraction for Diverse Corpora |
title_sort | tees 2.2: biomedical event extraction for diverse corpora |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642046/ https://www.ncbi.nlm.nih.gov/pubmed/26551925 http://dx.doi.org/10.1186/1471-2105-16-S16-S4 |
work_keys_str_mv | AT bjornejari tees22biomedicaleventextractionfordiversecorpora AT salakoskitapio tees22biomedicaleventextractionfordiversecorpora |