Cargando…

Semantic role labeling for protein transport predicates

BACKGROUND: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying th...

Descripción completa

Detalles Bibliográficos
Autores principales: Bethard, Steven, Lu, Zhiyong, Martin, James H, Hunter, Lawrence
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2474622/
https://www.ncbi.nlm.nih.gov/pubmed/18547432
http://dx.doi.org/10.1186/1471-2105-9-277
_version_ 1782157491906281472
author Bethard, Steven
Lu, Zhiyong
Martin, James H
Hunter, Lawrence
author_facet Bethard, Steven
Lu, Zhiyong
Martin, James H
Hunter, Lawrence
author_sort Bethard, Steven
collection PubMed
description BACKGROUND: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role. RESULTS: We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones. CONCLUSION: We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.
format Text
id pubmed-2474622
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24746222008-07-18 Semantic role labeling for protein transport predicates Bethard, Steven Lu, Zhiyong Martin, James H Hunter, Lawrence BMC Bioinformatics Research Article BACKGROUND: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs – manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role. RESULTS: We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones. CONCLUSION: We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles. BioMed Central 2008-06-11 /pmc/articles/PMC2474622/ /pubmed/18547432 http://dx.doi.org/10.1186/1471-2105-9-277 Text en Copyright © 2008 Bethard et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bethard, Steven
Lu, Zhiyong
Martin, James H
Hunter, Lawrence
Semantic role labeling for protein transport predicates
title Semantic role labeling for protein transport predicates
title_full Semantic role labeling for protein transport predicates
title_fullStr Semantic role labeling for protein transport predicates
title_full_unstemmed Semantic role labeling for protein transport predicates
title_short Semantic role labeling for protein transport predicates
title_sort semantic role labeling for protein transport predicates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2474622/
https://www.ncbi.nlm.nih.gov/pubmed/18547432
http://dx.doi.org/10.1186/1471-2105-9-277
work_keys_str_mv AT bethardsteven semanticrolelabelingforproteintransportpredicates
AT luzhiyong semanticrolelabelingforproteintransportpredicates
AT martinjamesh semanticrolelabelingforproteintransportpredicates
AT hunterlawrence semanticrolelabelingforproteintransportpredicates