Cargando…

Optimizing graph-based patterns to extract biomedical events from the literature

IN BIONLP-ST 2013: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) ta...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Haibin, Verspoor, Karin, Comeau, Donald C, MacKinlay, Andrew D, Wilbur, W John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642081/
https://www.ncbi.nlm.nih.gov/pubmed/26551594
http://dx.doi.org/10.1186/1471-2105-16-S16-S2
_version_ 1782400301526941696
author Liu, Haibin
Verspoor, Karin
Comeau, Donald C
MacKinlay, Andrew D
Wilbur, W John
author_facet Liu, Haibin
Verspoor, Karin
Comeau, Donald C
MacKinlay, Andrew D
Wilbur, W John
author_sort Liu, Haibin
collection PubMed
description IN BIONLP-ST 2013: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3(rd)) and a 48.93% F-score in the GE task (ranking 4(th)). AFTER BIONLP-ST 2013: We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.
format Online
Article
Text
id pubmed-4642081
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46420812015-11-19 Optimizing graph-based patterns to extract biomedical events from the literature Liu, Haibin Verspoor, Karin Comeau, Donald C MacKinlay, Andrew D Wilbur, W John BMC Bioinformatics Research IN BIONLP-ST 2013: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3(rd)) and a 48.93% F-score in the GE task (ranking 4(th)). AFTER BIONLP-ST 2013: We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall. BioMed Central 2015-10-30 /pmc/articles/PMC4642081/ /pubmed/26551594 http://dx.doi.org/10.1186/1471-2105-16-S16-S2 Text en Copyright © 2015 Liu et al. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Haibin
Verspoor, Karin
Comeau, Donald C
MacKinlay, Andrew D
Wilbur, W John
Optimizing graph-based patterns to extract biomedical events from the literature
title Optimizing graph-based patterns to extract biomedical events from the literature
title_full Optimizing graph-based patterns to extract biomedical events from the literature
title_fullStr Optimizing graph-based patterns to extract biomedical events from the literature
title_full_unstemmed Optimizing graph-based patterns to extract biomedical events from the literature
title_short Optimizing graph-based patterns to extract biomedical events from the literature
title_sort optimizing graph-based patterns to extract biomedical events from the literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642081/
https://www.ncbi.nlm.nih.gov/pubmed/26551594
http://dx.doi.org/10.1186/1471-2105-16-S16-S2
work_keys_str_mv AT liuhaibin optimizinggraphbasedpatternstoextractbiomedicaleventsfromtheliterature
AT verspoorkarin optimizinggraphbasedpatternstoextractbiomedicaleventsfromtheliterature
AT comeaudonaldc optimizinggraphbasedpatternstoextractbiomedicaleventsfromtheliterature
AT mackinlayandrewd optimizinggraphbasedpatternstoextractbiomedicaleventsfromtheliterature
AT wilburwjohn optimizinggraphbasedpatternstoextractbiomedicaleventsfromtheliterature