Cargando…

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

BACKGROUND: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Acti...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Xu, Kim, Jung-jae, Kwoh, Chee Keong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4849099/ https://www.ncbi.nlm.nih.gov/pubmed/27127603 http://dx.doi.org/10.1186/s13326-016-0059-z

_version_	1782429489337204736
author	Han, Xu Kim, Jung-jae Kwoh, Chee Keong
author_facet	Han, Xu Kim, Jung-jae Kwoh, Chee Keong
author_sort	Han, Xu
collection	PubMed
description	BACKGROUND: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. METHODS: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. RESULTS AND CONCLUSION: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
format	Online Article Text
id	pubmed-4849099
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48490992016-04-29 Active learning for ontological event extraction incorporating named entity recognition and unknown word handling Han, Xu Kim, Jung-jae Kwoh, Chee Keong J Biomed Semantics Research BACKGROUND: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. METHODS: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. RESULTS AND CONCLUSION: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities. BioMed Central 2016-04-27 /pmc/articles/PMC4849099/ /pubmed/27127603 http://dx.doi.org/10.1186/s13326-016-0059-z Text en © Han et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Han, Xu Kim, Jung-jae Kwoh, Chee Keong Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title	Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title_full	Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title_fullStr	Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title_full_unstemmed	Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title_short	Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
title_sort	active learning for ontological event extraction incorporating named entity recognition and unknown word handling
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4849099/ https://www.ncbi.nlm.nih.gov/pubmed/27127603 http://dx.doi.org/10.1186/s13326-016-0059-z
work_keys_str_mv	AT hanxu activelearningforontologicaleventextractionincorporatingnamedentityrecognitionandunknownwordhandling AT kimjungjae activelearningforontologicaleventextractionincorporatingnamedentityrecognitionandunknownwordhandling AT kwohcheekeong activelearningforontologicaleventextractionincorporatingnamedentityrecognitionandunknownwordhandling

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

Ejemplares similares