Cargando…

BioCause: Annotating and analysing causality in the biomedical domain

BACKGROUND: Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which...

Descripción completa

Detalles Bibliográficos
Autores principales: Mihăilă, Claudiu, Ohta, Tomoko, Pyysalo, Sampo, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3621543/
https://www.ncbi.nlm.nih.gov/pubmed/23323613
http://dx.doi.org/10.1186/1471-2105-14-2
_version_ 1782265718169927680
author Mihăilă, Claudiu
Ohta, Tomoko
Pyysalo, Sampo
Ananiadou, Sophia
author_facet Mihăilă, Claudiu
Ohta, Tomoko
Pyysalo, Sampo
Ananiadou, Sophia
author_sort Mihăilă, Claudiu
collection PubMed
description BACKGROUND: Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. RESULTS: We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. CONCLUSION: Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work.
format Online
Article
Text
id pubmed-3621543
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36215432013-04-10 BioCause: Annotating and analysing causality in the biomedical domain Mihăilă, Claudiu Ohta, Tomoko Pyysalo, Sampo Ananiadou, Sophia BMC Bioinformatics Research Article BACKGROUND: Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. RESULTS: We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. CONCLUSION: Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work. BioMed Central 2013-01-16 /pmc/articles/PMC3621543/ /pubmed/23323613 http://dx.doi.org/10.1186/1471-2105-14-2 Text en Copyright © 2013 Mihăilă et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mihăilă, Claudiu
Ohta, Tomoko
Pyysalo, Sampo
Ananiadou, Sophia
BioCause: Annotating and analysing causality in the biomedical domain
title BioCause: Annotating and analysing causality in the biomedical domain
title_full BioCause: Annotating and analysing causality in the biomedical domain
title_fullStr BioCause: Annotating and analysing causality in the biomedical domain
title_full_unstemmed BioCause: Annotating and analysing causality in the biomedical domain
title_short BioCause: Annotating and analysing causality in the biomedical domain
title_sort biocause: annotating and analysing causality in the biomedical domain
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3621543/
https://www.ncbi.nlm.nih.gov/pubmed/23323613
http://dx.doi.org/10.1186/1471-2105-14-2
work_keys_str_mv AT mihailaclaudiu biocauseannotatingandanalysingcausalityinthebiomedicaldomain
AT ohtatomoko biocauseannotatingandanalysingcausalityinthebiomedicaldomain
AT pyysalosampo biocauseannotatingandanalysingcausalityinthebiomedicaldomain
AT ananiadousophia biocauseannotatingandanalysingcausalityinthebiomedicaldomain