Cargando…

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency...

Descripción completa

Detalles Bibliográficos
Autores principales: Björne, Jari, Salakoski, Tapio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642046/
https://www.ncbi.nlm.nih.gov/pubmed/26551925
http://dx.doi.org/10.1186/1471-2105-16-S16-S4
_version_ 1782400293722390528
author Björne, Jari
Salakoski, Tapio
author_facet Björne, Jari
Salakoski, Tapio
author_sort Björne, Jari
collection PubMed
description BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
format Online
Article
Text
id pubmed-4642046
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46420462015-11-19 TEES 2.2: Biomedical Event Extraction for Diverse Corpora Björne, Jari Salakoski, Tapio BMC Bioinformatics Research BACKGROUND: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. RESULTS: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. CONCLUSIONS: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. BioMed Central 2015-10-30 /pmc/articles/PMC4642046/ /pubmed/26551925 http://dx.doi.org/10.1186/1471-2105-16-S16-S4 Text en Copyright © 2015 Björne and Salakoski http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Björne, Jari
Salakoski, Tapio
TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title_full TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title_fullStr TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title_full_unstemmed TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title_short TEES 2.2: Biomedical Event Extraction for Diverse Corpora
title_sort tees 2.2: biomedical event extraction for diverse corpora
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642046/
https://www.ncbi.nlm.nih.gov/pubmed/26551925
http://dx.doi.org/10.1186/1471-2105-16-S16-S4
work_keys_str_mv AT bjornejari tees22biomedicaleventextractionfordiversecorpora
AT salakoskitapio tees22biomedicaleventextractionfordiversecorpora