Cargando…

A resource-saving collective approach to biomedical semantic role labeling

BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS’s). Currently, a major problem of BioSRL...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tsai, Richard Tzong-Han, Lai, Po-Ting
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062501/ https://www.ncbi.nlm.nih.gov/pubmed/24884358 http://dx.doi.org/10.1186/1471-2105-15-160

_version_	1782321662906073088
author	Tsai, Richard Tzong-Han Lai, Po-Ting
author_facet	Tsai, Richard Tzong-Han Lai, Po-Ting
author_sort	Tsai, Richard Tzong-Han
collection	PubMed
description	BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS’s). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity. RESULTS: We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time. CONCLUSIONS: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.
format	Online Article Text
id	pubmed-4062501
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40625012014-06-27 A resource-saving collective approach to biomedical semantic role labeling Tsai, Richard Tzong-Han Lai, Po-Ting BMC Bioinformatics Research Article BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS’s). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity. RESULTS: We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time. CONCLUSIONS: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future. BioMed Central 2014-05-27 /pmc/articles/PMC4062501/ /pubmed/24884358 http://dx.doi.org/10.1186/1471-2105-15-160 Text en Copyright © 2014 Tsai and Lai; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Tsai, Richard Tzong-Han Lai, Po-Ting A resource-saving collective approach to biomedical semantic role labeling
title	A resource-saving collective approach to biomedical semantic role labeling
title_full	A resource-saving collective approach to biomedical semantic role labeling
title_fullStr	A resource-saving collective approach to biomedical semantic role labeling
title_full_unstemmed	A resource-saving collective approach to biomedical semantic role labeling
title_short	A resource-saving collective approach to biomedical semantic role labeling
title_sort	resource-saving collective approach to biomedical semantic role labeling
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062501/ https://www.ncbi.nlm.nih.gov/pubmed/24884358 http://dx.doi.org/10.1186/1471-2105-15-160
work_keys_str_mv	AT tsairichardtzonghan aresourcesavingcollectiveapproachtobiomedicalsemanticrolelabeling AT laipoting aresourcesavingcollectiveapproachtobiomedicalsemanticrolelabeling AT tsairichardtzonghan resourcesavingcollectiveapproachtobiomedicalsemanticrolelabeling AT laipoting resourcesavingcollectiveapproachtobiomedicalsemanticrolelabeling

A resource-saving collective approach to biomedical semantic role labeling

Ejemplares similares