Cargando…

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes

BACKGROUND: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on han...

Descripción completa

Detalles Bibliográficos
Autores principales: Vincze, Veronika, Szarvas, György, Farkas, Richárd, Móra, György, Csirik, János
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586758/
https://www.ncbi.nlm.nih.gov/pubmed/19025695
http://dx.doi.org/10.1186/1471-2105-9-S11-S9
_version_ 1782160909813153792
author Vincze, Veronika
Szarvas, György
Farkas, Richárd
Móra, György
Csirik, János
author_facet Vincze, Veronika
Szarvas, György
Farkas, Richárd
Móra, György
Csirik, János
author_sort Vincze, Veronika
collection PubMed
description BACKGROUND: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). RESULTS: The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. CONCLUSION: Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts.
format Text
id pubmed-2586758
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25867582008-11-26 The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes Vincze, Veronika Szarvas, György Farkas, Richárd Móra, György Csirik, János BMC Bioinformatics Research BACKGROUND: Detecting uncertain and negative assertions is essential in most BioMedical Text Mining tasks where, in general, the aim is to derive factual knowledge from textual data. This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). RESULTS: The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief linguist – also responsible for setting up the annotation guidelines – who resolved cases where the annotators disagreed. The resulting corpus consists of more than 20.000 sentences that were considered for annotation and over 10% of them actually contain one (or more) linguistic annotation suggesting negation or uncertainty. CONCLUSION: Statistics are reported on corpus size, ambiguity levels and the consistency of annotations. The corpus is accessible for academic purposes and is free of charge. Apart from the intended goal of serving as a common resource for the training, testing and comparing of biomedical Natural Language Processing systems, the corpus is also a good resource for the linguistic analysis of scientific and clinical texts. BioMed Central 2008-11-19 /pmc/articles/PMC2586758/ /pubmed/19025695 http://dx.doi.org/10.1186/1471-2105-9-S11-S9 Text en Copyright © 2008 Vincze et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Vincze, Veronika
Szarvas, György
Farkas, Richárd
Móra, György
Csirik, János
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title_full The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title_fullStr The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title_full_unstemmed The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title_short The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
title_sort bioscope corpus: biomedical texts annotated for uncertainty, negation and their scopes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586758/
https://www.ncbi.nlm.nih.gov/pubmed/19025695
http://dx.doi.org/10.1186/1471-2105-9-S11-S9
work_keys_str_mv AT vinczeveronika thebioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT szarvasgyorgy thebioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT farkasrichard thebioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT moragyorgy thebioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT csirikjanos thebioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT vinczeveronika bioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT szarvasgyorgy bioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT farkasrichard bioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT moragyorgy bioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes
AT csirikjanos bioscopecorpusbiomedicaltextsannotatedforuncertaintynegationandtheirscopes