Cargando…

BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features

BACKGROUND: Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsai, Richard Tzong-Han, Chou, Wen-Chi, Su, Ying-Shan, Lin, Yu-Chun, Sung, Cheng-Lung, Dai, Hong-Jie, Yeh, Irene Tzu-Hsuan, Ku, Wei, Sung, Ting-Yi, Hsu, Wen-Lian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2072962/
https://www.ncbi.nlm.nih.gov/pubmed/17764570
http://dx.doi.org/10.1186/1471-2105-8-325
_version_ 1782137811861766144
author Tsai, Richard Tzong-Han
Chou, Wen-Chi
Su, Ying-Shan
Lin, Yu-Chun
Sung, Cheng-Lung
Dai, Hong-Jie
Yeh, Irene Tzu-Hsuan
Ku, Wei
Sung, Ting-Yi
Hsu, Wen-Lian
author_facet Tsai, Richard Tzong-Han
Chou, Wen-Chi
Su, Ying-Shan
Lin, Yu-Chun
Sung, Cheng-Lung
Dai, Hong-Jie
Yeh, Irene Tzu-Hsuan
Ku, Wei
Sung, Ting-Yi
Hsu, Wen-Lian
author_sort Tsai, Richard Tzong-Han
collection PubMed
description BACKGROUND: Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events. RESULTS: To evaluate the performance of BIOSMILE, we conducted two experiments to (1) compare the performance of SRL systems trained on newswire and biomedical corpora; and (2) examine the effects of using biomedical-specific features. The experimental results show that using BioProp improves the F-score of the SRL system by 21.45% over an SRL system that uses a newswire corpus. It is noteworthy that adding automatically generated template features improves the overall F-score by a further 0.52%. Specifically, ArgM-LOC, ArgM-MNR, and Arg2 achieve statistically significant performance improvements of 3.33%, 2.27%, and 1.44%, respectively. CONCLUSION: We demonstrate the necessity of using a biomedical proposition bank for training SRL systems in the biomedical domain. Besides the different characteristics of biomedical and newswire sentences, factors such as cross-domain framesets and verb usage variations also influence the performance of SRL systems. For argument classification, we find that NE (named entity) features indicating if the target node matches with NEs are not effective, since NEs may match with a node of the parsing tree that does not have semantic role labels in the training set. We therefore incorporate templates composed of specific words, NE types, and POS tags into the SRL system. As a result, the classification accuracy for adjunct arguments, which is especially important for biomedical SRL, is improved significantly.
format Text
id pubmed-2072962
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20729622007-11-10 BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features Tsai, Richard Tzong-Han Chou, Wen-Chi Su, Ying-Shan Lin, Yu-Chun Sung, Cheng-Lung Dai, Hong-Jie Yeh, Irene Tzu-Hsuan Ku, Wei Sung, Ting-Yi Hsu, Wen-Lian BMC Bioinformatics Research Article BACKGROUND: Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events. RESULTS: To evaluate the performance of BIOSMILE, we conducted two experiments to (1) compare the performance of SRL systems trained on newswire and biomedical corpora; and (2) examine the effects of using biomedical-specific features. The experimental results show that using BioProp improves the F-score of the SRL system by 21.45% over an SRL system that uses a newswire corpus. It is noteworthy that adding automatically generated template features improves the overall F-score by a further 0.52%. Specifically, ArgM-LOC, ArgM-MNR, and Arg2 achieve statistically significant performance improvements of 3.33%, 2.27%, and 1.44%, respectively. CONCLUSION: We demonstrate the necessity of using a biomedical proposition bank for training SRL systems in the biomedical domain. Besides the different characteristics of biomedical and newswire sentences, factors such as cross-domain framesets and verb usage variations also influence the performance of SRL systems. For argument classification, we find that NE (named entity) features indicating if the target node matches with NEs are not effective, since NEs may match with a node of the parsing tree that does not have semantic role labels in the training set. We therefore incorporate templates composed of specific words, NE types, and POS tags into the SRL system. As a result, the classification accuracy for adjunct arguments, which is especially important for biomedical SRL, is improved significantly. BioMed Central 2007-09-01 /pmc/articles/PMC2072962/ /pubmed/17764570 http://dx.doi.org/10.1186/1471-2105-8-325 Text en Copyright © 2007 Tsai et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tsai, Richard Tzong-Han
Chou, Wen-Chi
Su, Ying-Shan
Lin, Yu-Chun
Sung, Cheng-Lung
Dai, Hong-Jie
Yeh, Irene Tzu-Hsuan
Ku, Wei
Sung, Ting-Yi
Hsu, Wen-Lian
BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title_full BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title_fullStr BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title_full_unstemmed BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title_short BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
title_sort biosmile: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2072962/
https://www.ncbi.nlm.nih.gov/pubmed/17764570
http://dx.doi.org/10.1186/1471-2105-8-325
work_keys_str_mv AT tsairichardtzonghan biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT chouwenchi biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT suyingshan biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT linyuchun biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT sungchenglung biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT daihongjie biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT yehirenetzuhsuan biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT kuwei biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT sungtingyi biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures
AT hsuwenlian biosmileasemanticrolelabelingsystemforbiomedicalverbsusingamaximumentropymodelwithautomaticallygeneratedtemplatefeatures