Cargando…

Semi-supervised method for biomedical event extraction

BACKGROUND: Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness. METHODS: In this study, a semi-...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jian, Xu, Qian, Lin, Hongfei, Yang, Zhihao, Li, Yanpeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909242/
https://www.ncbi.nlm.nih.gov/pubmed/24565105
http://dx.doi.org/10.1186/1477-5956-11-S1-S17
_version_ 1782301810772410368
author Wang, Jian
Xu, Qian
Lin, Hongfei
Yang, Zhihao
Li, Yanpeng
author_facet Wang, Jian
Xu, Qian
Lin, Hongfei
Yang, Zhihao
Li, Yanpeng
author_sort Wang, Jian
collection PubMed
description BACKGROUND: Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness. METHODS: In this study, a semi-supervised method for combining labeled data with large scale of unlabeled data is presented to improve the performance of biomedical event extraction. We propose a set of rich feature vector, including a variety of syntactic features and semantic features, such as N-gram features, walk subsequence features, predicate argument structure (PAS) features, especially some new features derived from a strategy named Event Feature Coupling Generalization (EFCG). The EFCG algorithm can create useful event recognition features by making use of the correlation between two sorts of original features explored from the labeled data, while the correlation is computed with the help of massive amounts of unlabeled data. This introduced EFCG approach aims to solve the data sparse problem caused by limited tagging corpus, and enables the new features to cover much more event related information with better generalization properties. RESULTS: The effectiveness of our event extraction system is evaluated on the datasets from the BioNLP Shared Task 2011 and PubMed. Experimental results demonstrate the state-of-the-art performance in the fine-grained biomedical information extraction task. CONCLUSIONS: Limited labeled data could be combined with unlabeled data to tackle the data sparseness problem by means of our EFCG approach, and the classified capability of the model was enhanced through establishing a rich feature set by both labeled and unlabeled datasets. So this semi-supervised learning approach could go far towards improving the performance of the event extraction system. To the best of our knowledge, it was the first attempt at combining labeled and unlabeled data for tasks related biomedical event extraction.
format Online
Article
Text
id pubmed-3909242
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39092422014-02-13 Semi-supervised method for biomedical event extraction Wang, Jian Xu, Qian Lin, Hongfei Yang, Zhihao Li, Yanpeng Proteome Sci Research BACKGROUND: Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness. METHODS: In this study, a semi-supervised method for combining labeled data with large scale of unlabeled data is presented to improve the performance of biomedical event extraction. We propose a set of rich feature vector, including a variety of syntactic features and semantic features, such as N-gram features, walk subsequence features, predicate argument structure (PAS) features, especially some new features derived from a strategy named Event Feature Coupling Generalization (EFCG). The EFCG algorithm can create useful event recognition features by making use of the correlation between two sorts of original features explored from the labeled data, while the correlation is computed with the help of massive amounts of unlabeled data. This introduced EFCG approach aims to solve the data sparse problem caused by limited tagging corpus, and enables the new features to cover much more event related information with better generalization properties. RESULTS: The effectiveness of our event extraction system is evaluated on the datasets from the BioNLP Shared Task 2011 and PubMed. Experimental results demonstrate the state-of-the-art performance in the fine-grained biomedical information extraction task. CONCLUSIONS: Limited labeled data could be combined with unlabeled data to tackle the data sparseness problem by means of our EFCG approach, and the classified capability of the model was enhanced through establishing a rich feature set by both labeled and unlabeled datasets. So this semi-supervised learning approach could go far towards improving the performance of the event extraction system. To the best of our knowledge, it was the first attempt at combining labeled and unlabeled data for tasks related biomedical event extraction. BioMed Central 2013-11-07 /pmc/articles/PMC3909242/ /pubmed/24565105 http://dx.doi.org/10.1186/1477-5956-11-S1-S17 Text en Copyright © 2013 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Jian
Xu, Qian
Lin, Hongfei
Yang, Zhihao
Li, Yanpeng
Semi-supervised method for biomedical event extraction
title Semi-supervised method for biomedical event extraction
title_full Semi-supervised method for biomedical event extraction
title_fullStr Semi-supervised method for biomedical event extraction
title_full_unstemmed Semi-supervised method for biomedical event extraction
title_short Semi-supervised method for biomedical event extraction
title_sort semi-supervised method for biomedical event extraction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3909242/
https://www.ncbi.nlm.nih.gov/pubmed/24565105
http://dx.doi.org/10.1186/1477-5956-11-S1-S17
work_keys_str_mv AT wangjian semisupervisedmethodforbiomedicaleventextraction
AT xuqian semisupervisedmethodforbiomedicaleventextraction
AT linhongfei semisupervisedmethodforbiomedicaleventextraction
AT yangzhihao semisupervisedmethodforbiomedicaleventextraction
AT liyanpeng semisupervisedmethodforbiomedicaleventextraction