Cargando…

Enriching a biomedical event corpus with meta-knowledge annotation

BACKGROUND: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-dis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Thompson, Paul, Nawaz, Raheel, McNaught, John, Ananiadou, Sophia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222636/ https://www.ncbi.nlm.nih.gov/pubmed/21985429 http://dx.doi.org/10.1186/1471-2105-12-393

_version_	1782217210030194688
author	Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia
author_facet	Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia
author_sort	Thompson, Paul
collection	PubMed
description	BACKGROUND: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. RESULTS: We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. CONCLUSION: By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.
format	Online Article Text
id	pubmed-3222636
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32226362011-11-23 Enriching a biomedical event corpus with meta-knowledge annotation Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia BMC Bioinformatics Research Article BACKGROUND: Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event. RESULTS: We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa. CONCLUSION: By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event. BioMed Central 2011-10-10 /pmc/articles/PMC3222636/ /pubmed/21985429 http://dx.doi.org/10.1186/1471-2105-12-393 Text en Copyright ©2011 Thompson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia Enriching a biomedical event corpus with meta-knowledge annotation
title	Enriching a biomedical event corpus with meta-knowledge annotation
title_full	Enriching a biomedical event corpus with meta-knowledge annotation
title_fullStr	Enriching a biomedical event corpus with meta-knowledge annotation
title_full_unstemmed	Enriching a biomedical event corpus with meta-knowledge annotation
title_short	Enriching a biomedical event corpus with meta-knowledge annotation
title_sort	enriching a biomedical event corpus with meta-knowledge annotation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3222636/ https://www.ncbi.nlm.nih.gov/pubmed/21985429 http://dx.doi.org/10.1186/1471-2105-12-393
work_keys_str_mv	AT thompsonpaul enrichingabiomedicaleventcorpuswithmetaknowledgeannotation AT nawazraheel enrichingabiomedicaleventcorpuswithmetaknowledgeannotation AT mcnaughtjohn enrichingabiomedicaleventcorpuswithmetaknowledgeannotation AT ananiadousophia enrichingabiomedicaleventcorpuswithmetaknowledgeannotation

Enriching a biomedical event corpus with meta-knowledge annotation

Ejemplares similares