Cargando…

PASBio: predicate-argument structures for event extraction in molecular biology

BACKGROUND: The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE system...

Descripción completa

Detalles Bibliográficos
Autores principales: Wattarujeekrit, Tuangthong, Shah, Parantu K, Collier, Nigel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC535924/
https://www.ncbi.nlm.nih.gov/pubmed/15494078
http://dx.doi.org/10.1186/1471-2105-5-155
_version_ 1782122035946717184
author Wattarujeekrit, Tuangthong
Shah, Parantu K
Collier, Nigel
author_facet Wattarujeekrit, Tuangthong
Shah, Parantu K
Collier, Nigel
author_sort Wattarujeekrit, Tuangthong
collection PubMed
description BACKGROUND: The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE systems have traditionally used shallow syntactic patterns for matching facts in sentences but such approaches appear inadequate to achieve high accuracy in MB event extraction due to complex sentence structure. A consensus in the IE community is emerging on the necessity for exploiting deeper knowledge structures such as through the relations between a verb and its arguments shown by predicate-argument structure (PAS). PAS is of interest as structures typically correspond to events of interest and their participating entities. For this to be realized within IE a key knowledge component is the definition of PAS frames. PAS frames for non-technical domains such as newswire are already being constructed in several projects such as PropBank, VerbNet, and FrameNet. Knowledge from PAS should enable more accurate applications in several areas where sentence understanding is required like machine translation and text summarization. In this article, we explore the need to adapt PAS for the MB domain and specify PAS frames to support IE, as well as outlining the major issues that require consideration in their construction. RESULTS: We introduce PASBio by extending a model based on PropBank to the MB domain. The hypothesis we explore is that PAS holds the key for understanding relationships describing the roles of genes and gene products in mediating their biological functions. We chose predicates describing gene expression, molecular interactions and signal transduction events with the aim of covering a number of research areas in MB. Analysis was performed on sentences containing a set of verbal predicates from MEDLINE and full text journals. Results confirm the necessity to analyze PAS specifically for MB domain. CONCLUSIONS: At present PASBio contains the analyzed PAS of over 30 verbs, publicly available on the Internet for use in advanced applications. In the future we aim to expand the knowledge base to cover more verbs and the nominal form of each predicate.
format Text
id pubmed-535924
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5359242004-12-18 PASBio: predicate-argument structures for event extraction in molecular biology Wattarujeekrit, Tuangthong Shah, Parantu K Collier, Nigel BMC Bioinformatics Research Article BACKGROUND: The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE systems have traditionally used shallow syntactic patterns for matching facts in sentences but such approaches appear inadequate to achieve high accuracy in MB event extraction due to complex sentence structure. A consensus in the IE community is emerging on the necessity for exploiting deeper knowledge structures such as through the relations between a verb and its arguments shown by predicate-argument structure (PAS). PAS is of interest as structures typically correspond to events of interest and their participating entities. For this to be realized within IE a key knowledge component is the definition of PAS frames. PAS frames for non-technical domains such as newswire are already being constructed in several projects such as PropBank, VerbNet, and FrameNet. Knowledge from PAS should enable more accurate applications in several areas where sentence understanding is required like machine translation and text summarization. In this article, we explore the need to adapt PAS for the MB domain and specify PAS frames to support IE, as well as outlining the major issues that require consideration in their construction. RESULTS: We introduce PASBio by extending a model based on PropBank to the MB domain. The hypothesis we explore is that PAS holds the key for understanding relationships describing the roles of genes and gene products in mediating their biological functions. We chose predicates describing gene expression, molecular interactions and signal transduction events with the aim of covering a number of research areas in MB. Analysis was performed on sentences containing a set of verbal predicates from MEDLINE and full text journals. Results confirm the necessity to analyze PAS specifically for MB domain. CONCLUSIONS: At present PASBio contains the analyzed PAS of over 30 verbs, publicly available on the Internet for use in advanced applications. In the future we aim to expand the knowledge base to cover more verbs and the nominal form of each predicate. BioMed Central 2004-10-19 /pmc/articles/PMC535924/ /pubmed/15494078 http://dx.doi.org/10.1186/1471-2105-5-155 Text en Copyright © 2004 Wattarujeekrit et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Wattarujeekrit, Tuangthong
Shah, Parantu K
Collier, Nigel
PASBio: predicate-argument structures for event extraction in molecular biology
title PASBio: predicate-argument structures for event extraction in molecular biology
title_full PASBio: predicate-argument structures for event extraction in molecular biology
title_fullStr PASBio: predicate-argument structures for event extraction in molecular biology
title_full_unstemmed PASBio: predicate-argument structures for event extraction in molecular biology
title_short PASBio: predicate-argument structures for event extraction in molecular biology
title_sort pasbio: predicate-argument structures for event extraction in molecular biology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC535924/
https://www.ncbi.nlm.nih.gov/pubmed/15494078
http://dx.doi.org/10.1186/1471-2105-5-155
work_keys_str_mv AT wattarujeekrittuangthong pasbiopredicateargumentstructuresforeventextractioninmolecularbiology
AT shahparantuk pasbiopredicateargumentstructuresforeventextractioninmolecularbiology
AT colliernigel pasbiopredicateargumentstructuresforeventextractioninmolecularbiology