Cargando…

Wide coverage biomedical event extraction using multiple partially overlapping corpora

BACKGROUND: Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires...

Descripción completa

Detalles Bibliográficos
Autores principales: Miwa, Makoto, Pyysalo, Sampo, Ohta, Tomoko, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680179/
https://www.ncbi.nlm.nih.gov/pubmed/23731785
http://dx.doi.org/10.1186/1471-2105-14-175
_version_ 1782273081877725184
author Miwa, Makoto
Pyysalo, Sampo
Ohta, Tomoko
Ananiadou, Sophia
author_facet Miwa, Makoto
Pyysalo, Sampo
Ohta, Tomoko
Ananiadou, Sophia
author_sort Miwa, Makoto
collection PubMed
description BACKGROUND: Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. RESULTS: We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. CONCLUSIONS: The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora.
format Online
Article
Text
id pubmed-3680179
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36801792013-06-13 Wide coverage biomedical event extraction using multiple partially overlapping corpora Miwa, Makoto Pyysalo, Sampo Ohta, Tomoko Ananiadou, Sophia BMC Bioinformatics Research Article BACKGROUND: Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. RESULTS: We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. CONCLUSIONS: The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. BioMed Central 2013-06-03 /pmc/articles/PMC3680179/ /pubmed/23731785 http://dx.doi.org/10.1186/1471-2105-14-175 Text en Copyright © 2013 Miwa et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Miwa, Makoto
Pyysalo, Sampo
Ohta, Tomoko
Ananiadou, Sophia
Wide coverage biomedical event extraction using multiple partially overlapping corpora
title Wide coverage biomedical event extraction using multiple partially overlapping corpora
title_full Wide coverage biomedical event extraction using multiple partially overlapping corpora
title_fullStr Wide coverage biomedical event extraction using multiple partially overlapping corpora
title_full_unstemmed Wide coverage biomedical event extraction using multiple partially overlapping corpora
title_short Wide coverage biomedical event extraction using multiple partially overlapping corpora
title_sort wide coverage biomedical event extraction using multiple partially overlapping corpora
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3680179/
https://www.ncbi.nlm.nih.gov/pubmed/23731785
http://dx.doi.org/10.1186/1471-2105-14-175
work_keys_str_mv AT miwamakoto widecoveragebiomedicaleventextractionusingmultiplepartiallyoverlappingcorpora
AT pyysalosampo widecoveragebiomedicaleventextractionusingmultiplepartiallyoverlappingcorpora
AT ohtatomoko widecoveragebiomedicaleventextractionusingmultiplepartiallyoverlappingcorpora
AT ananiadousophia widecoveragebiomedicaleventextractionusingmultiplepartiallyoverlappingcorpora