Cargando…

BioInfer: a corpus for information extraction in the biomedical domain

BACKGROUND: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annot...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pyysalo, Sampo, Ginter, Filip, Heimonen, Juho, Björne, Jari, Boberg, Jorma, Järvinen, Jouni, Salakoski, Tapio
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1808065/ https://www.ncbi.nlm.nih.gov/pubmed/17291334 http://dx.doi.org/10.1186/1471-2105-8-50

_version_	1782132517591056384
author	Pyysalo, Sampo Ginter, Filip Heimonen, Juho Björne, Jari Boberg, Jorma Järvinen, Jouni Salakoski, Tapio
author_facet	Pyysalo, Sampo Ginter, Filip Heimonen, Juho Björne, Jari Boberg, Jorma Järvinen, Jouni Salakoski, Tapio
author_sort	Pyysalo, Sampo
collection	PubMed
description	BACKGROUND: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. RESULTS: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. CONCLUSION: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at .
format	Text
id	pubmed-1808065
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18080652007-03-13 BioInfer: a corpus for information extraction in the biomedical domain Pyysalo, Sampo Ginter, Filip Heimonen, Juho Björne, Jari Boberg, Jorma Järvinen, Jouni Salakoski, Tapio BMC Bioinformatics Research Article BACKGROUND: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. RESULTS: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. CONCLUSION: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at . BioMed Central 2007-02-09 /pmc/articles/PMC1808065/ /pubmed/17291334 http://dx.doi.org/10.1186/1471-2105-8-50 Text en Copyright © 2007 Pyysalo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Pyysalo, Sampo Ginter, Filip Heimonen, Juho Björne, Jari Boberg, Jorma Järvinen, Jouni Salakoski, Tapio BioInfer: a corpus for information extraction in the biomedical domain
title	BioInfer: a corpus for information extraction in the biomedical domain
title_full	BioInfer: a corpus for information extraction in the biomedical domain
title_fullStr	BioInfer: a corpus for information extraction in the biomedical domain
title_full_unstemmed	BioInfer: a corpus for information extraction in the biomedical domain
title_short	BioInfer: a corpus for information extraction in the biomedical domain
title_sort	bioinfer: a corpus for information extraction in the biomedical domain
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1808065/ https://www.ncbi.nlm.nih.gov/pubmed/17291334 http://dx.doi.org/10.1186/1471-2105-8-50
work_keys_str_mv	AT pyysalosampo bioinferacorpusforinformationextractioninthebiomedicaldomain AT ginterfilip bioinferacorpusforinformationextractioninthebiomedicaldomain AT heimonenjuho bioinferacorpusforinformationextractioninthebiomedicaldomain AT bjornejari bioinferacorpusforinformationextractioninthebiomedicaldomain AT bobergjorma bioinferacorpusforinformationextractioninthebiomedicaldomain AT jarvinenjouni bioinferacorpusforinformationextractioninthebiomedicaldomain AT salakoskitapio bioinferacorpusforinformationextractioninthebiomedicaldomain

BioInfer: a corpus for information extraction in the biomedical domain

Ejemplares similares