Cargando…

Combining joint models for biomedical event extraction

BACKGROUND: We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the bes...

Descripción completa

Detalles Bibliográficos
Autores principales: McClosky, David, Riedel, Sebastian, Surdeanu, Mihai, McCallum, Andrew, Manning, Christopher D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3395172/
https://www.ncbi.nlm.nih.gov/pubmed/22759463
http://dx.doi.org/10.1186/1471-2105-13-S11-S9
_version_ 1782237946206748672
author McClosky, David
Riedel, Sebastian
Surdeanu, Mihai
McCallum, Andrew
Manning, Christopher D
author_facet McClosky, David
Riedel, Sebastian
Surdeanu, Mihai
McCallum, Andrew
Manning, Christopher D
author_sort McClosky, David
collection PubMed
description BACKGROUND: We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Our primary focus is on stacking where the predictions from the Stanford system are used as features in the UMass system. For comparison, we look at simpler model combination techniques such as intersection and union which require only the outputs from each system and combine them directly. RESULTS: First, we find that stacking substantially improves performance while intersection and union provide no significant benefits. Second, we investigate the graph properties of event structures and their impact on the combination of our systems. Finally, we trace the origins of events proposed by the stacked model to determine the role each system plays in different components of the output. We learn that, while stacking can propose novel event structures not seen in either base model, these events have extremely low precision. Removing these novel events improves our already state-of-the-art F1 to 56.6% on the test set of Genia (Task 1). Overall, the combined system formed via stacking ("FAUST") performed well in the BioNLP 2011 shared task. The FAUST system obtained 1st place in three out of four tasks: 1st place in Genia Task 1 (56.0% F1) and Task 2 (53.9%), 2nd place in the Epigenetics and Post-translational Modifications track (35.0%), and 1st place in the Infectious Diseases track (55.6%). CONCLUSION: We present a state-of-the-art event extraction system that relies on the strengths of structured prediction and model combination through stacking. Akin to results on other tasks, stacking outperforms intersection and union and leads to very strong results. The utility of model combination hinges on complementary views of the data, and we show that our sub-systems capture different graph properties of event structures. Finally, by removing low precision novel events, we show that performance from stacking can be further improved.
format Online
Article
Text
id pubmed-3395172
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33951722012-07-16 Combining joint models for biomedical event extraction McClosky, David Riedel, Sebastian Surdeanu, Mihai McCallum, Andrew Manning, Christopher D BMC Bioinformatics Proceedings BACKGROUND: We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Our primary focus is on stacking where the predictions from the Stanford system are used as features in the UMass system. For comparison, we look at simpler model combination techniques such as intersection and union which require only the outputs from each system and combine them directly. RESULTS: First, we find that stacking substantially improves performance while intersection and union provide no significant benefits. Second, we investigate the graph properties of event structures and their impact on the combination of our systems. Finally, we trace the origins of events proposed by the stacked model to determine the role each system plays in different components of the output. We learn that, while stacking can propose novel event structures not seen in either base model, these events have extremely low precision. Removing these novel events improves our already state-of-the-art F1 to 56.6% on the test set of Genia (Task 1). Overall, the combined system formed via stacking ("FAUST") performed well in the BioNLP 2011 shared task. The FAUST system obtained 1st place in three out of four tasks: 1st place in Genia Task 1 (56.0% F1) and Task 2 (53.9%), 2nd place in the Epigenetics and Post-translational Modifications track (35.0%), and 1st place in the Infectious Diseases track (55.6%). CONCLUSION: We present a state-of-the-art event extraction system that relies on the strengths of structured prediction and model combination through stacking. Akin to results on other tasks, stacking outperforms intersection and union and leads to very strong results. The utility of model combination hinges on complementary views of the data, and we show that our sub-systems capture different graph properties of event structures. Finally, by removing low precision novel events, we show that performance from stacking can be further improved. BioMed Central 2012-06-26 /pmc/articles/PMC3395172/ /pubmed/22759463 http://dx.doi.org/10.1186/1471-2105-13-S11-S9 Text en Copyright ©2012 McClosky et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
McClosky, David
Riedel, Sebastian
Surdeanu, Mihai
McCallum, Andrew
Manning, Christopher D
Combining joint models for biomedical event extraction
title Combining joint models for biomedical event extraction
title_full Combining joint models for biomedical event extraction
title_fullStr Combining joint models for biomedical event extraction
title_full_unstemmed Combining joint models for biomedical event extraction
title_short Combining joint models for biomedical event extraction
title_sort combining joint models for biomedical event extraction
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3395172/
https://www.ncbi.nlm.nih.gov/pubmed/22759463
http://dx.doi.org/10.1186/1471-2105-13-S11-S9
work_keys_str_mv AT mccloskydavid combiningjointmodelsforbiomedicaleventextraction
AT riedelsebastian combiningjointmodelsforbiomedicaleventextraction
AT surdeanumihai combiningjointmodelsforbiomedicaleventextraction
AT mccallumandrew combiningjointmodelsforbiomedicaleventextraction
AT manningchristopherd combiningjointmodelsforbiomedicaleventextraction